Commit Graph

202 Commits

Author SHA1 Message Date
Odin Ugedal
61d88af9e4
Revert "Update runc to 1.0.0" 2021-07-05 14:03:04 +02:00
Kir Kolyshkin
ab5b77944e kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-30 16:17:35 -07:00
Artyom Lukianov
03830db82d Implement all necessary methods to provide memory manager data under pod resources metrics
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 13:06:32 +03:00
Francesco Romani
369416b763 cm: handle nil cpumanager avoiding segfault
If the cpumanager feature gate is disabled, the corresponsing field
of the containerManager will be nil.
A couple functions don't check for this occurrence and happily
deference the pointer unconditionally, leading to possible segfaults.

The relevant functions were introduced to support the podresources API,
so to trigger this segfault all the following are needed:
- cpumanager feature gate has to be disabled explicitely
- any podresources API must be called

Worth pointing out that when the new functions were introduced (around
kubernetes 1.20) the default feature gate for cpumanager was already set
to true, hence this bug is expected to be triggered rarely.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-06-10 16:22:43 +02:00
Giuseppe Scrivano
12abc3b7c9 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-19 23:51:49 -07:00
Jordan Liggitt
4b45d0d921 Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94"
This reverts commit b1b06fe0a4, reversing
changes made to 382a33986b.
2021-05-18 09:13:47 -04:00
Giuseppe Scrivano
fd7ecd3915 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-10 17:34:53 -07:00
Ryan Phillips
4488162bd9 kubelet: change cgroup move message to log level 3 2021-04-28 14:54:54 -05:00
chen zechun
d16d57b7d1 fix delete duplicate logs 2021-04-02 16:18:47 +08:00
Kubernetes Prow Robot
38fbecf0c8
Merge pull request #100001 from shiyajuan123/logs
migrate kubelet/cm/container logs to structured logging
2021-03-16 14:50:06 -07:00
shiyajuan123
d344fc612e fix and update 2021-03-10 14:58:17 +08:00
shiyajuan123
9cee635494 fix and update 2021-03-10 14:29:03 +08:00
Kubernetes Prow Robot
770a9504ea
Merge pull request #95734 from fromanirh/podresources-concrete-resources-apis
podresources APIs: concrete resources apis: implement GetAllocatableResources
2021-03-09 14:29:04 -08:00
Francesco Romani
8afdf4f146 node: podresources: translate types in cm
during the review, we convened that the manager types
(CPUSet, ResourceDeviceInstances) should not cross the
containermanager API boundary; thus, the ContainerManager layer
is the correct place to do the type conversion

We push back the type conversions from the podresources server
layer, fixing tests accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
ad68f9588c node: podresources: make GetDevices() consistent
We want to make the return type of the GetDevices() method of the
podresources DevicesProvider interface consistent with
the newly added GetAllocatableDevices type.
This makes the code easier to read and reduces the coupling between
the podresourcesapi server and the devicemanager code.

No intended changes in behaviour, but the different return types
now requires some data massaging. Tests are updated accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
6d33354e4c node: podresources: implement GetAllocatableResources API
Extend the podresources API implementing the GetAllocatableResources endpoint,
as specified in the KEPs:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
https://github.com/kubernetes/enhancements/pull/2404

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
1375c5bdc7 node: podresources: make GetCPUs return cpuset
a upcoming patch wants to add GetAllocatableCPUs() returning a cpuset.
To make the code consistent and a bit more flexible, we change the
existing interface to also return a cpuset.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
shiyajuan123
6b53f8c65d migrate kubelet/cm/container logs to structured logging 2021-03-09 18:27:54 +08:00
David Porter
904cb67267 Fixes after runc libcontainer and docker update
- libcontainer renamed
  `github.com/opencontainers/runc/libcontainer/configs` to
  `github.com/opencontainers/runc/libcontainer/devices` so use the new
  references

- Update `dockershim` `ContainerCreate` call after docker update to
  v20.10.2
2021-03-08 22:10:29 -08:00
Kubernetes Prow Robot
d819199065
Merge pull request #97888 from pacoxu/fix/97565
check containerd as well as docker-containerd
2021-02-09 23:46:59 -08:00
Artyom Lukianov
9ae499ae46 memory manager: pass memory manager flags to the container manager
Pass memory manager flags to the container manager and call all relevant memory manager
methods under the container manager.

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
2021-02-09 00:54:58 +02:00
pacoxu
89c42bd3d5 check containerd as process name instead of docker-containerd
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-01-23 10:55:18 +08:00
Artyom Lukianov
60678a24ca Update CPU manager GetCPUs method to return pointer to CPUSet 2021-01-20 13:21:57 +02:00
Seth Jennings
ee60ee26e0 kubelet: remove periodic messages from log-level 2 2020-11-30 11:34:00 -06:00
Krzysztof Wiatrzyk
b2be584e5b Implement topology manager scopes
* Add topologyScopeName parameter to NewManager().
* Add scope interface and structure that implement common logic
* Add pod scope & container scopes
* Add pod lifecycle functions

Co-authored-by: sw.han <sw.han@samsung.com>

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Alexey Perevalov
9f54dccc92 Change GetDevices interface
This change is necessary for supporting Topology in the ContainerDevices.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 12:41:31 +03:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Kubernetes Prow Robot
119c94214c
Merge pull request #93931 from SataQiu/fix-kubelet-swap-20200812
kubelet: assume that swap is disabled when /proc/swaps does not exist
2020-09-11 04:20:14 -07:00
SataQiu
ad1739f8bc kubelet: assume that swap is disabled when /proc/swaps does not exist 2020-08-12 22:43:58 +08:00
Alexey Perevalov
a047e8aa1b move to cadvisor.MachineInfo
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-24 09:29:41 -04:00
Giuseppe Scrivano
0d2a493a8f
kubelet: skip setting the devices cgroup
use the new libcontainer feature of skipping setting the devices
cgroup.  This is necessary on cgroup v2 to avoid leaking a eBPF
program every time the cgroup is re-configured.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-09 09:37:46 +02:00
Giuseppe Scrivano
e94aebf4cb
pkg/kubelet: adapt to new libcontainer API
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-24 18:39:51 +02:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
d92fdebd85
Merge pull request #89897 from giuseppe/test-e2e-node
kubelet: fix e2e-node cgroups test on cgroup v2
2020-04-20 15:54:12 -07:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
Giuseppe Scrivano
6d16fee229
kubelet: cpu hard capping is supported on cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:03 +02:00
Kubernetes Prow Robot
34c8b26c9f
Merge pull request #85218 from giuseppe/cgroupv2
kubelet: add initial support for cgroupv2
2020-03-26 14:10:23 -07:00
Byonggon Chun
a3047672d0 move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing
containerMap is used in CPU Manager to store all containers information in the node.
containerMap provides a mapping from (pod, container) -> containerID for all containers a pod
It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node.

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
2020-03-14 02:38:51 +09:00
Giuseppe Scrivano
bb5ed1b797
kubelet: add initial support for cgroupv2
do a conversion from the cgroups v1 limits to cgroups v2.

e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the
equivalent on cgroups v2 is cpu.weight that uses a range [1-10000].

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-12 08:50:19 +01:00
nolancon
4baa1d967d Check for nil cpuManager 2020-03-05 07:54:33 +00:00
Kevin Klues
2327934a86 Rename GetTopologyPodAmitHandler() as
GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its
new function. Also remove the Topology Manager feature gate check at higher level
kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().
2020-02-27 07:52:43 +00:00
Kevin Klues
0d68bffd03 Change GetTopologyPodAdmitHandler() to be more general
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.
2020-02-27 07:24:26 +00:00
Kevin Klues
a3f099ea4d Split devicemanager Allocate into two functions
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().

The semantics split across them:

// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error

// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
    node *schedulernodeinfo.NodeInfo,
    attrs *lifecycle.PodAdmitAttributes) error

As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
2020-02-10 03:27:47 +00:00
Takeaki Matsumoto
785fac6826 Make updateAllocatedDevices() as a public method and call it in
podresources api
2020-02-07 13:26:56 +09:00
whypro
f4bd4e2e96 Return error instead of panic when cpu manager starts failed. 2019-12-19 21:56:23 +08:00