Previously, the verious Merge() policies of the TopologyManager all
eturned their own lifecycle.PodAdmitResult result. However, for
consistency in any failed admits, this is better handled in the
top-level Topology manager, with each policy only returning a boolean
about whether or not they would like to admit the pod or not. This
commit changes the semantics to match this logic.
Previously, we unconditionally removed *all* topology hints from a pod
whenever just one container was being removed. This commit makes it so
we only remove the hints for the single container being removed, and
then conditionally remove the pod from the podTopologyHints[podUID] when
no containers left in it.
This is in preparation for removing the special-case of the
SingleNumaNode policy in mergeProvidersHints() in favor of a custom
merging strategy with much less overhead.
This abstraction moves the responsibility of merging topology hints to
the individual policies themselves. As part of this, it removes the
CanAdmitPodResult() API from the policy abstraction, and rolls it into a
second return value from Merge()
This patch fixes an issue in the TopologyManager that wouldn't allow
pods to be admitted if pods were launched with the SingleNUMANode policy
and any of the hint providers had no NUMA preferences.
This is due to 2 factors:
1) Any hint provider that passes back a `nil` as its hints, has its hint
automatically transformed into a single {11 true} hint before merging
2) We added a special casing for the SingleNumaNodePolicy() in the
TopologyManager that essentially turns these hints into a
{11 false} anytime a {11 true} is seen.
The current patch reworks this logic so the that TopologyManager can
tell the difference between a "don't care" hint and a true "{11 true}"
hint returned by the hint provider. Only true "{11 true}" hints will be
converted by the special casing for the SingleNumaNodePolicy(), while
"don't care" hints will not.
This is a short term fix for this issue until we do a larger refactoring
of this code for the 1.17 release.
- As discussed in reviews and other public channels,
this abstraction is used to represent numa nodes, not sockets.
- There is nothing inherently related to sockets in this package anyway.
Added one off fix for single-numa-node policy to correctly
reject pod admission on a resource allocation that spans
NUMA nodes
Co-authored-by: Kevin Klues <kklues@nvidia.com>
Previously it only took a bool, which limited the logic it could perform
to determine if a pod should be admitted or not based on the merged hint
from the policy.
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
At present, there is no way for a hint provider to return distinct hints
for different resource types via a call to GetTopologyHints(). This
means that hint providers that govern multiple resource types (e.g. the
devicemanager) must do some sort of "pre-merge" on the hints it
generates for each resource type before passing them back to the
TopologyManager.
This patch changes the GetTopologyHints() interface to allow a hint
provider to pass back raw hints for each resource type, and allow the
TopologyManager to merge them using a single unified strategy.
This change also allows the TopologyManager to recognize which
resource type a set of hints originated from, should this information
become useful in the future.
Previously, the topologymanager would simply fall back to the None() policy
if an invalid policy was specified. This patch updates this to return an
error when an invalid policy is passed, forcing the kubelet to fail
fast when this occurs.
These semantics should be preferable because an invalid policy likely
indicates operator error in setting the policy flag on the kubelet
correctly (e.g. misspelling 'strict' as 'striict'). In this case it is
better to fail fast so the operator can detect this and correct the
mistake, than to mask the error and essentially disable the
topologymanager unexpectedly.
These updates are based on discussions had about the preferred semantics
of the TopologyManager and will be reflected in changes to an upcoming
PR that adds the actual TopologyManager implementation.