1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.
Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
The path module has a few different functions:
Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not
take into account the OS-specific path separator, meaning that they
won't behave as intended on Windows.
For example, Dir is supposed to return all but the last element of the
path. For the path "C:\some\dir\somewhere", it is supposed to return
"C:\some\dir\", however, it returns ".".
Instead of these functions, the ones in filepath should be used instead.
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- config options not supported on Windows.
- files not closed, which means that they cannot be removed / renamed.
- paths not properly joined (filepath.Join should be used).
- time.Now() is not as precise on Windows, which means that 2
consecutive calls may return the same timestamp.
- different error messages on Windows.
- files have \r\n line endings on Windows.
- /tmp directory being used, which might not exist on Windows. Instead,
the OS-specific Temp directory should be used.
- the default value for Kubelet's EvictionHard field was containing
OS-specific fields. This is now moved, the field is now set during
Kubelet's initialization, after the config file is read.
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes.
The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container
requests.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
- provide tests for static policy allocation, when init containers
requested memory bigger than the memory requested by app containers
- provide tests for static policy allocation, when init containers
requested memory smaller than the memory requested by app containers
- provide tests to verify that init containers removed from the state
file once the app container started
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Remove init containers from the state file once the app container started,
it will release the memory allocated for the init container and can intense
the density of containers on the NUMA node in cases when the memory allocated
for init containers is bigger than the memory allocated for app containers.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The idea that during allocation phase we will:
- during call to `Allocate` and `GetTopologyHints` we will take into account the init containers reusable memory,
which means that we will re-use the memory and update container memory blocks accordingly.
For example for the pod with two init containers that requested: 1Gi and 2Gi,
and app container that requested 4Gi, we can re-use 2Gi of memory.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Set the container cpuset.memory during the creation and avoid an additional
call to the resources update of the container.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
We will have two layers of the validation.
- the first part of the validation logic will be implemented under the
`ValidateKubeletConfiguration` method
- the second one that requires knowledge about machine topology and
node allocatable resources will be implemented under the memory manager.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Move the fakeTopologyManagerWithHint and all related methods
from the topology manager package to the memory manager static policy unittests.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The commit rename state structs and some fields under these structs.
- NodeMap -> NUMANodeMap
- NodeState -> NUMANodeState
- NUMANodeState.Nodes -> NUMANodesState.Cells
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Calculate the total amount of reserved memory only for NUMA nodes
that are existing under the machine.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
- The `Allocate` method will try to allocate the memory according to the affinity hints
saved under the `TopologyManager` store. If the store does not have any hints for the memory
it will call `getDefaultHint` to get the default hint. If the affinity does not satisfy
the memory request, it will call `extendTopologyManagerHint` to extend the topology hint to
satisfy the memory request. Once it has the preferred hint, it will allocate the memory and
update the the memory manager state accordingly.
- The `RemoveContainer` will release the allocated memory and update the memory manager state accordingly.
- The `GetTopologyHints` method will try to re-generate topology hints when the container already presents
under the memory manager state. If it does not present it will call `calculateHints` to get topology hints.
The `calculateHints` uses an approach similar to the one used under the CPU manager:
1. If the container memory request can be satisfied by the single NUMA node, it will not allocate the memory from
more than one NUMA node and it will set only single NUMA hints as the preferred one.
It can affect the density, but it gives us guarantees regarding the NUMA alignment.
2. The NUMA node used in the multi NUMA assignment can not be used in the single NUMA assignment.
And the NUMA node used in the single NUMA assignment can not be used in the multi NUMA assignment.
3. Only hints with NUMA node that have enough memory will be returned.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Reserved memory of all kinds (and over all
NUMA nodes) must be equal to the values determined
by Node Allocatable feature.
Signed-off-by: Cezary Zukowski <c.zukowski@samsung.com>
Pass memory manager flags to the container manager and call all relevant memory manager
methods under the container manager.
Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
The commit includes tests to verify the functionallity:
- to restore state from the file
- to store the state to the file
- to clean the state from old data
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The checkpoint manager provides a way to save the memory manager
`MemoryTable` both under the memory and under the state file.
Saving the `MemoryTable` under the state file can be useful when kubelet
restarted and you want to restore memory allocations for running containers.
Also, it provides a way to monitor memory allocations done by the memory manager,
and in the future, the state file content can be exposed under the pod metrics.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>