Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.
As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error
This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
- allocatePodResources logic altered to allow for container by container
device allocation.
- New type PodReusableDevices
- New field in devicemanager devicesToReuse
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().
The semantics split across them:
// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error
// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
node *schedulernodeinfo.NodeInfo,
attrs *lifecycle.PodAdmitAttributes) error
As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.