GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its
new function. Also remove the Topology Manager feature gate check at higher level
kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().
- allocatePodResources logic altered to allow for container by container
device allocation.
- New type PodReusableDevices
- New field in devicemanager devicesToReuse
- Where previously we called manager.AddContainer(), we now call both
manager.Allocate() and manager.AddContainer().
- Some test cases now have two expected errors. One each
from Allocate() and AddContainer(). Existing outcomes are unchanged.
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
Having this interface allows us to perform a tight loop of:
for each container {
containerHints = {}
for each provider {
containerHints[provider] = provider.GatherHints(container)
}
containerHints.MergeAndPublish()
for each provider {
provider.Allocate(container)
}
}
With this in place we can now be sure that the hints gathered in one
iteration of the loop always consider the allocations made in the
previous.
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().
The semantics split across them:
// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error
// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
node *schedulernodeinfo.NodeInfo,
attrs *lifecycle.PodAdmitAttributes) error
As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
Previously, the verious Merge() policies of the TopologyManager all
eturned their own lifecycle.PodAdmitResult result. However, for
consistency in any failed admits, this is better handled in the
top-level Topology manager, with each policy only returning a boolean
about whether or not they would like to admit the pod or not. This
commit changes the semantics to match this logic.
Previously, we unconditionally removed *all* topology hints from a pod
whenever just one container was being removed. This commit makes it so
we only remove the hints for the single container being removed, and
then conditionally remove the pod from the podTopologyHints[podUID] when
no containers left in it.
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements
Signed-off-by: sewon.oh <sewon.oh@samsung.com>
- Move remaining logic from mergeProvidersHints to generic top level
mergeFilteredHints function.
- Add numaNodes as parameter in order to make generic.
- Move single NUMA node specific check to single-numa-node Merge
function.
- Move initial 'filtering' functionality to generic function
filterProvidersHints level policy.go.
- Call new function from top level Merge function.
- Rename some variables/parameters to reflect changes.
A recent change made it so that the CPUManager receives a list of
initial containers that exist on the system at startup. This list can be
non-empty, for example, after a kubelet retart.
This commit ensures that the CPUManagers containerMap structure is
initialized with the containers from this list.
The logic has been updated to match the logic of the best-effort policy
except in two places:
1) The hint filtering frunction has been updated to allow "don't care"
hints encoded with a `nil` affinity mask, to pass through the filter in
addition to hints that have just a single NUMA bit set.
2) After calculating the `bestHint` we transform "don't care" affinities
encoded as having all NUMA bits set in their affinity masks into "don't
care" affinities encoded as `nil`.