Commit Graph

74 Commits

Author SHA1 Message Date
Kir Kolyshkin
11b0d57c93 pkg/kubelet/cm/cgroup_manager: simplify setting hugetlb
Commit 79be8be10e made hugetlb settings optional if cgroup v2 is used and
hugetlb is not available, fixing issue 92933. Note at that time this was only
needed for v2, because for v1 the resources were set one-by-one, and only for
supported resources.

Commit d312ef7eb6 switched the code to using Set from runc/libcontainer
cgroups manager, and expanded the check to cgroup v1 as well.

Move this check earlier, to inside m.toResources, so instead of
converting all hugetlb resources from ResourceConfig to libcontainers's
Resources.HugetlbLimit, and then setting it to nil, we can skip the
conversion entirely if hugetlb is not supported, thus not doing the work
that is not needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-08 17:05:46 -08:00
Kir Kolyshkin
59148e22d0 pkg/kubelet/cm: rm dup code
Commit ecd6361f added setting PidsLimit to Create and Update.

Commit bce9d5f2 added setting PidsLimit to m.toResources.

Now, PidsLimit is assigned twice.

Remove the duplicate.

Fixes: bce9d5f2
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-08 17:05:46 -08:00
Kir Kolyshkin
a673b64864 kubelet/cm: speed up cgroup creation
There's no need to call m.Update (which will create another instance of
libcontainer cgroup manager, convert all the resources and then set
them). All this is already done here, except for Set().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-08 17:05:46 -08:00
Kir Kolyshkin
c06a851042 pkg/kubelet/cm: use SkipFreezeOnSet
This is a knob added by runc 1.0.2 specifically for kubernetes,
which tells runc/libcontainer/cgroups/systemd v1 manager to not
freeze the cgroup in Set().

We set this knob here because this code is only used for pods
(rather than containers) management, and in this place we create or
update the pod cgroup with no device limits set, so we can skip the
freeze.

If this knob is not set, libcontainer's cgroup v1 manager tries to
figure out whether the freeze is needed or not, but it's a somewhat
expensive check to perform, thus the knob is a shortcut.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-23 13:41:51 -07:00
Wesley Williams
ff165c8823
Replace usage of Whitelist with Allowlist within Kubelet's sysctl package (#102298)
* Change uses of whitelist to allowlist in kubelet sysctl

* Rename whitelist files to allowlist in Kubelet sysctl

* Further renames of whitelist to allowlist in Kubelet

* Rename podsecuritypolicy uses of whitelist to allowlist

* Update pkg/kubelet/kubelet.go

Co-authored-by: Danielle <dani@builds.terrible.systems>

Co-authored-by: Danielle <dani@builds.terrible.systems>
2021-08-04 18:59:35 -07:00
Kir Kolyshkin
e5b434e990 kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-16 12:45:35 -07:00
Li Bo
c3d9b10ca8 feature: support Memory QoS for cgroups v2 2021-07-08 09:26:46 +08:00
Odin Ugedal
61d88af9e4
Revert "Update runc to 1.0.0" 2021-07-05 14:03:04 +02:00
Kir Kolyshkin
ab5b77944e kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-30 16:17:35 -07:00
Kir Kolyshkin
f1aee7e049 kubelet/cm: GetResourceStats -> MemoryUsage
Commit cc50aa9dfb introduced GetResourceStats, a method which collected
all the statistics from various cgroup controllers, only to discard all
of the info collected except a single value (memory usage).

While one may argue that this method can potentially be used from other
places, this did not happen since it was added 4+ years ago.

Let's streamline this code and only collect what we need, i.e. memory
usage. Rename the method accordingly.

While at it, fix pkg/kubelet/cm/cgroup_manager_unsupported.go to not
instantiate a new error every time a method is called.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-23 20:43:52 -07:00
Kir Kolyshkin
c299b8fc9a kubelet/cm: rm propagateControllers
This was added by commit a9772b2290.

In the current codebase, the cgroup being updated was created using
runc/opencontainers' manager.Apply(), which already does controllers
propagation, so there is no need to repeat that on every update.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-21 13:44:54 -07:00
Odin Ugedal
d312ef7eb6 Set cgroups via opencontainer
This sets cgroup config via libcontainer to make sure we apply the
correct values to the systemd slices and scopes.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:52:01 -07:00
Kir Kolyshkin
f3cdfc488e vendor: bump runc to rc95
runc rc95 contains a fix for CVE-2021-30465.

runc rc94 provides fixes and improvements.

One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

        github.com/cilium/ebpf v0.5.0
        github.com/containerd/console v1.0.2
        github.com/coreos/go-systemd/v22 v22.3.1
        github.com/godbus/dbus/v5 v5.0.4
        github.com/moby/sys/mountinfo v0.4.1
        golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
        github.com/google/go-cmp v0.5.4
        github.com/kr/pretty v0.2.1
        github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:51:59 -07:00
Jordan Liggitt
4b45d0d921 Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94"
This reverts commit b1b06fe0a4, reversing
changes made to 382a33986b.
2021-05-18 09:13:47 -04:00
Kir Kolyshkin
b49744f177 vendor: bump runc to rc94
One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

	github.com/cilium/ebpf v0.5.0
	github.com/containerd/console v1.0.2
	github.com/coreos/go-systemd/v22 v22.3.1
	github.com/godbus/dbus/v5 v5.0.4
	github.com/moby/sys/mountinfo v0.4.1
	golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
	github.com/google/go-cmp v0.5.4
	github.com/kr/pretty v0.2.1
	github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-11 11:56:42 -07:00
Utsav Oza
2f3a4ec9cb Migrate remaining files in pkg/kubelet to structured logging 2021-03-12 22:36:28 +05:30
David Porter
904cb67267 Fixes after runc libcontainer and docker update
- libcontainer renamed
  `github.com/opencontainers/runc/libcontainer/configs` to
  `github.com/opencontainers/runc/libcontainer/devices` so use the new
  references

- Update `dockershim` `ContainerCreate` call after docker update to
  v20.10.2
2021-03-08 22:10:29 -08:00
Odin Ugedal
124de526cb Fix cgroup handling for systemd with cgroup v2
This fixes issues where kubelet enforces qos and nodeAllocatable on the
worng hierarchy. Kublet will now create the files

/sys/fs/cgroup/kubepods/{burstable,besteffort,}/pod-xyz

when running with systemd as the driver, making it impossible to enforce
the limits on nodeAllocatable.
2021-02-12 10:44:38 +01:00
Kubernetes Prow Robot
293a53f2c0
Merge pull request #94140 from derekwaynecarr/pid-ga
Promote PidLimits to GA
2020-09-09 06:35:52 -07:00
Derek Carr
6f2153986a Promote PidLimits to GA 2020-08-24 13:57:48 -04:00
Giuseppe Scrivano
49cbf91fce
kubelet, cgroupv2: do not create /sys/fs/cgroup/sys with cgroupfs
Closes: https://github.com/kubernetes/kubernetes/issues/94104

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-08-19 22:29:38 +02:00
Giuseppe Scrivano
79be8be10e
kubelet, cgroupv2: make hugetlb optional
make the hugetlb controller optional when cgroup v2 is used.

Closes: https://github.com/kubernetes/kubernetes/issues/92933

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-13 09:40:55 +02:00
Giuseppe Scrivano
0d2a493a8f
kubelet: skip setting the devices cgroup
use the new libcontainer feature of skipping setting the devices
cgroup.  This is necessary on cgroup v2 to avoid leaking a eBPF
program every time the cgroup is re-configured.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-09 09:37:46 +02:00
Giuseppe Scrivano
e94aebf4cb
pkg/kubelet: adapt to new libcontainer API
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-24 18:39:51 +02:00
kadisi
a75323c76b fix unexpected append mutations about pkg/kubelet package
Signed-off-by: kadisi <iamkadisi@163.com>
Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2020-06-03 13:36:57 +08:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Giuseppe Scrivano
26d94ad628
kubelet: do not configure the device cgroup
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:06 +02:00
Giuseppe Scrivano
a9772b2290
kubelet: adapt cgroup_manager to cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:04 +02:00
Giuseppe Scrivano
c4429d8bd4
kubelet: add tests for cgroup v2 conversions
follow-up for https://github.com/kubernetes/kubernetes/pull/85218

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-27 13:50:57 +01:00
Giuseppe Scrivano
bb5ed1b797
kubelet: add initial support for cgroupv2
do a conversion from the cgroups v1 limits to cgroups v2.

e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the
equivalent on cgroups v2 is cpu.weight that uses a range [1-10000].

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-12 08:50:19 +01:00
sewon.oh
463442aa29
Update container hugepage limit when creating the container
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements

Signed-off-by: sewon.oh <sewon.oh@samsung.com>
2020-01-28 09:35:02 +09:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
bingshen.wbs
47642a0bad fix kubelet failed to start on setting hugetlb limits in non-exist cgroup dir
cause by kubelet startup be interrupted on setting list of cgroups
In the 'cgroupManagerImpl.Exists' not check&recreate the hugetlb cgroup dir. Then setting the limits in non-exist cgroup dir will cause kubelet start failed.

Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
2019-11-06 16:39:55 +08:00
Odin Ugedal
b9cfb19321
Rename cgroupsystemd.Manager to LegacyManager 2019-10-05 14:22:35 +02:00
Tim Allclair
8a495cb5e4 Clean up error messages (ST1005) 2019-08-21 10:40:21 -07:00
Kubernetes Prow Robot
80537a9c5f
Merge pull request #77323 from tedyu/cgroup-mgr-linux
Check error return from Update
2019-07-15 14:53:24 -07:00
Odin Ugedal
4ee5fe23e8
Fix cgroup hugetlb size prefix for kB
Use the exported list from runc that uses "KB" and not "kB".

This issue breaks kubelet on AArch64 (arm 64).

var HugePageSizeUnitList = []string{"B", "KB", "MB", "GB", "TB", "PB"}

The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).

The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with huge
pages smaller than 1MiB. This is the case for AArch64.

As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.

Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"

And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-06-28 21:28:26 +02:00
rafatio
08c258add9 Ignore cgroup pid support if related feature gates are disabled 2019-06-15 18:45:27 -03:00
Ted Yu
89c8a91c0f Check error return from Update
Signed-off-by: Ted Yu <yute@vmware.com>
2019-05-02 09:56:40 -07:00
danielqsj
79a3eb816c rename latency to duration in metrics 2019-02-18 17:40:04 +08:00
danielqsj
9fd99a48f5 Change kubelet metrics to conform guideline 2019-02-18 14:01:58 +08:00
Robert Krawitz
2597a1d97e Implement SupportNodePidsLimit, hand-tested 2019-02-13 14:56:17 -05:00
Derek Carr
deae071d78 Graduate HugePages feature to GA 2019-02-02 00:21:10 -05:00
David Ashpole
2b8bc85f75 fix panic in NodeAllocatable node e2e test 2019-01-17 10:57:09 -08:00
Derek Carr
bce9d5f204 SupportPodPidsLimit feature beta with tests 2019-01-09 10:50:59 -05:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
David Ashpole
d4f6ae3615 fix slice sharing bug in cgroup manager 2018-11-05 17:42:42 -08:00
k8s-ci-robot
0ca25b8db7
Merge pull request #68816 from FengyunPan2/cgroup-info
Add helpful log for checking cgrop path
2018-09-26 18:10:46 -07:00
FengyunPan2
34a8b1fd9f Add helpful log for checking cgrop path
Currently I just get 'xxx cgroup does not exist', but I don't know
which path has missed. Let's add log for it.
2018-09-25 10:10:12 +08:00
Pingan2017
2f1284bc34 cleanup unneeded if block 2018-08-30 17:18:56 +08:00