This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Consume in the static policy the cpu manager policy options from
the cpumanager instance.
Validate in the none policy if any option is given, and fail if so -
this is almost surely a configuration mistake.
Add new cpumanager.Options type to hold the options and translate from
user arguments to flags.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.
We defer the consumption of the options to a later patch(set).
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
We can set the container cpuset.cpus diring the creation and it
will not need to call to update resources after the container creation.
Additional side effect of the change, that the runc process that responsible
to create the container will run with the same CPU affinity because the
runc runs on the cpuset provided in the config.json arg.
It will allow to prevent undesirable interupts on isolated CPUs.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
It covers deviceplugin & cpumanager.
It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.
Checkpoint not covered by unit test.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
The expectation is that exclusive CPU allocations happen at pod
creation time. When a container restarts, it should not have its
exclusive CPU allocations removed, and it should not need to
re-allocate CPUs.
There are a few places in the current code that look for containers
that have exited and call CpuManager.RemoveContainer() to clean up
the container. This will end up deleting any exclusive CPU
allocations for that container, and if the container restarts within
the same pod it will end up using the default cpuset rather than
what should be exclusive CPUs.
Removing those calls and adding resource cleanup at allocation
time should get rid of the problem.
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
containerMap is used in CPU Manager to store all containers information in the node.
containerMap provides a mapping from (pod, container) -> containerID for all containers a pod
It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node.
Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
A recent change made it so that the CPUManager receives a list of
initial containers that exist on the system at startup. This list can be
non-empty, for example, after a kubelet retart.
This commit ensures that the CPUManagers containerMap structure is
initialized with the containers from this list.
This check is redundant since we protect this call with a call to
`m.sourcesReady.AllReady()` earlier on. Moreover, having this check in
place means that we will leave some stale state around in cases where
there are actually no active pods in the system and this loop hasn't
cleaned them up yet. This can happen, for example, if a pod exits while
the kubelet is down for some reason. We see this exact case being
triggered in our e2e tests, where a test has been failing since October
when this change was first introduced.
These information associatedd with these containers is used to migrate
the CPUManager state from it's old format to its new (i.e. keyed off of
podUID and containerName instead of containerID).
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.
This will become especially important as we move to a model where
exclusive CPUs are assigned at pod admission time rather than at pod
creation time.
Having this function will allow us to do garbage collection on these
CPUs anytime we are about to allocate CPUs to a new set of containers,
in addition to reclaiming state periodically in the reconcileState()
loop.
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.