This change bypasses all logic to set swap in the linux container
resources if a swap controller is not available on node. Failing
to do so may cause errors in runc when starting a container with
a swap configuration -- even if this is set to 0.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change switches to using isCgroup2UnifiedMode locally to ensure
that any mocked function is also used when checking the swap controller
availability.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
After this commit, when LimitedSwap is enabled,
containers would get swap acess limited with respect
the container memory request, total physical memory
on the node, and the swap size on the node.
Pods of Best-Effort / Guaranteed QoS classes don't get
to swap. In addition, container with memory requests
that are equal to their memory limits also don't get to
swap.
The swap limitation is calculated in the following way:
1. Calculate the container's memory proportionate to the node's memory:
- Divide the container's memory request by the total node's physical memory.
Let's call this value ContainerMemoryProportion.
2. Multiply the container memory proportion by the available
swap memory for Pods:
Meaning: ContainerMemoryProportion * TotalPodsSwapAvailable.
Fore more information:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md
Signed-off-by: Itamar Holder <iholder@redhat.com>
Before this commit, to find out the current node's
cgroup version, a libcontainers function was used
directly. This way, cgroup version is hard to mock
and is dependant on the environment in which the unit
tests are being run.
After this commit, libcontainer's function is wrapped
within a variable function that can be re-assigned by
the tests. This way the test can easily mock the cgroup
version and become environment independant.
After this commit both cgroup versions v1 and v2
are being tested, no matter in which environment
it runs.
Signed-off-by: Itamar Holder <iholder@redhat.com>
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
cpu.cfs_period_us is measured in microseconds in the kernel but
provided in time.Duration by the user, that change clarifies the code
to make this evident to the reader.
Also, the minimum value for that feature is 1ms and not 1μs, and this
change alters the validation to reject values smaller than 1ms.
The changes (mostly in pkg/kubelet/cm) are there to adopt changed
runc 1.1 API, and simplify things a bit. In particular:
1. simplify cgroup manager instantiation, using a new, easier way of
libcontainers/cgroups/manager.New;
2. replace libcontainerAdapter with a boolean variable (all it did
was passing on whether systemd manager should be used);
3. trivial change due to removed cgroupfs.HugePageSizes and added
cgroups.HugePageSizes();
4. do not calculate cgroup paths in update / destroy, since libcontainer
cgroup managers now calculate the paths upon creation (previously,
they were doing that only in Apply, so using e.g. Set or Destroy right
after creation was impossible without specifying paths).
We currently still calculate cgroup paths in Exists -- this is to be
addressed separately.
Co-Authored-By: Elana Hashman <ehashman@redhat.com>
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
On systems where the calculated cpu shares results in a value above the
max value in linux, containers getting that value are unable to start.
This occur on systems with 300+ cpu cores, and where containers are
given such a value.
This issue was fixed for the pod and qos control groups in the similar
cm.MilliCPUToShares that also has tests verifying the behavior. Since
this code already has an dependency on kubelet/cm, lets reuse that code
instead.
Seperate the CPU/Memory req/limit -> linux resource conversion into its
own function for better reuse.
Elsewhere in kuberuntime pkg, we will want to leverage this
requests/limits to Linux Resource type conversion.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements
Signed-off-by: sewon.oh <sewon.oh@samsung.com>