* Change uses of whitelist to allowlist in kubelet sysctl
* Rename whitelist files to allowlist in Kubelet sysctl
* Further renames of whitelist to allowlist in Kubelet
* Rename podsecuritypolicy uses of whitelist to allowlist
* Update pkg/kubelet/kubelet.go
Co-authored-by: Danielle <dani@builds.terrible.systems>
Co-authored-by: Danielle <dani@builds.terrible.systems>
Consume in the static policy the cpu manager policy options from
the cpumanager instance.
Validate in the none policy if any option is given, and fail if so -
this is almost surely a configuration mistake.
Add new cpumanager.Options type to hold the options and translate from
user arguments to flags.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Introduce a new `admission` subpackage to factor out the responsability
to create `PodAdmitResult` objects. This enables resource manager
to report specific errors in Allocate() and to bubble up them
in the relevant fields of the `PodAdmitResult`.
To demonstrate the approach we refactor TopologyAffinityError as a
proper error.
Co-authored-by: Kevin Klues <kklues@nvidia.com>
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.
We defer the consumption of the options to a later patch(set).
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Errors during setting the following sysctl values are ignored:
- vm.overcommit_memory
- vm.panic_on_oom
- kernel.panic
- kernel.panic_on_oops
- kernel.keys.root_maxkeys
- kernel.keys.root_maxbytes
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
- provide tests for static policy allocation, when init containers
requested memory bigger than the memory requested by app containers
- provide tests for static policy allocation, when init containers
requested memory smaller than the memory requested by app containers
- provide tests to verify that init containers removed from the state
file once the app container started
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Remove init containers from the state file once the app container started,
it will release the memory allocated for the init container and can intense
the density of containers on the NUMA node in cases when the memory allocated
for init containers is bigger than the memory allocated for app containers.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The idea that during allocation phase we will:
- during call to `Allocate` and `GetTopologyHints` we will take into account the init containers reusable memory,
which means that we will re-use the memory and update container memory blocks accordingly.
For example for the pod with two init containers that requested: 1Gi and 2Gi,
and app container that requested 4Gi, we can re-use 2Gi of memory.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
If the cpumanager feature gate is disabled, the corresponsing field
of the containerManager will be nil.
A couple functions don't check for this occurrence and happily
deference the pointer unconditionally, leading to possible segfaults.
The relevant functions were introduced to support the podresources API,
so to trigger this segfault all the following are needed:
- cpumanager feature gate has to be disabled explicitely
- any podresources API must be called
Worth pointing out that when the new functions were introduced (around
kubernetes 1.20) the default feature gate for cpumanager was already set
to true, hence this bug is expected to be triggered rarely.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Commit cc50aa9dfb introduced GetResourceStats, a method which collected
all the statistics from various cgroup controllers, only to discard all
of the info collected except a single value (memory usage).
While one may argue that this method can potentially be used from other
places, this did not happen since it was added 4+ years ago.
Let's streamline this code and only collect what we need, i.e. memory
usage. Rename the method accordingly.
While at it, fix pkg/kubelet/cm/cgroup_manager_unsupported.go to not
instantiate a new error every time a method is called.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This was added by commit a9772b2290.
In the current codebase, the cgroup being updated was created using
runc/opencontainers' manager.Apply(), which already does controllers
propagation, so there is no need to repeat that on every update.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This sets cgroup config via libcontainer to make sure we apply the
correct values to the systemd slices and scopes.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
runc rc95 contains a fix for CVE-2021-30465.
runc rc94 provides fixes and improvements.
One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.
Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):
github.com/cilium/ebpf v0.5.0
github.com/containerd/console v1.0.2
github.com/coreos/go-systemd/v22 v22.3.1
github.com/godbus/dbus/v5 v5.0.4
github.com/moby/sys/mountinfo v0.4.1
golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
github.com/google/go-cmp v0.5.4
github.com/kr/pretty v0.2.1
github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>