This code can be called not only when a container is dead and restarted,
but when is started for the first time too. For example, any pod with
initContainer and containers will exhibit this behaviour. The reason is
that in that case, the "if createPodSandbox" path will return the
initContainers only and on the next call to this function this code is
executed to start the containers for the fist time.
In that case, it is wrong to log that the container is dead and will be
restarted, as it was never started. In fact, the restart count will not
be increased.
This commit just changes this to say that the container is not in the
desired state and should be started. In the end, the kubelet is a state
machine and that is all we really care about.
No tests are added, as the behaviour was correct and tests don't check
logs messages.
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
As of now, the kubelet is passing the security context to container runtime even
if the security context has invalid options for a particular OS. As a result,
the pod fails to come up on the node. This error is particularly pronounced on
the Windows nodes where kubelet is allowing Linux specific options like SELinux,
RunAsUser etc where as in [documentation](https://kubernetes.io/docs/setup/production-environment/windows/intro-windows-in-kubernetes/#v1-container),
we clearly state they are not supported. This PR ensures that the kubelet strips
the security contexts of the pod, if they don't make sense on the Windows OS.
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Sockets don't affect performance as NUMA node does, since NUMA
node has dedicated memory controller, but socket it's physical
extension point.
Socket it's only cpu specific thing and it's strange to merge bitmask of
deviceplugin's and cpu manager, when cpu manager takes into account
socket.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
clamp the max cpu.shares to the maximum value allowed by the kernel.
It is not an issue when using cgroupfs, as the kernel will
anyway make sure the value is not out of range and automatically clamp
it, systemd has an additional check that prevents the cgroup creation.
Closes: https://github.com/kubernetes/kubernetes/issues/92855
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.
As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error
This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.