The expectation is that exclusive CPU allocations happen at pod
creation time. When a container restarts, it should not have its
exclusive CPU allocations removed, and it should not need to
re-allocate CPUs.
There are a few places in the current code that look for containers
that have exited and call CpuManager.RemoveContainer() to clean up
the container. This will end up deleting any exclusive CPU
allocations for that container, and if the container restarts within
the same pod it will end up using the default cpuset rather than
what should be exclusive CPUs.
Removing those calls and adding resource cleanup at allocation
time should get rid of the problem.
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
With the old strategy, it was possible for an init container to end up
running without some of its CPUs being exclusive if it requested more
guaranteed CPUs than the sum of all guaranteed CPUs requested by app
containers. Unfortunately, this case was not caught by our unit tests
because they didn't validate the state of the defaultCPUSet to ensure
there was no overlap with CPUs assigned to containers. This patch
updates the strategy to reuse the CPUs assigned to init containers
across into app containers, while avoiding this edge case. It also
updates the unit tests to now catch this type of error in the future.
The cpumanager file-based state backend was obsoleted since few
releases, aving the cpumanager moved to the checkpointmanager common
infrastructure.
The old test checking compatibility to/from the old format is
also no longer needed, because the checkpoint format is stable
(see
https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager).
Signed-off-by: Francesco Romani <fromani@redhat.com>
containerMap is used in CPU Manager to store all containers information in the node.
containerMap provides a mapping from (pod, container) -> containerID for all containers a pod
It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node.
Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
do a conversion from the cgroups v1 limits to cgroups v2.
e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the
equivalent on cgroups v2 is cpu.weight that uses a range [1-10000].
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func
and registered by the Kubelet is nil.
We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order
to be consistent with the original implementation.
GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its
new function. Also remove the Topology Manager feature gate check at higher level
kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().
- allocatePodResources logic altered to allow for container by container
device allocation.
- New type PodReusableDevices
- New field in devicemanager devicesToReuse
- Where previously we called manager.AddContainer(), we now call both
manager.Allocate() and manager.AddContainer().
- Some test cases now have two expected errors. One each
from Allocate() and AddContainer(). Existing outcomes are unchanged.
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.