Commit Graph

993 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
eae87bfe7e Merge pull request #103483 from odinuge/revert-102508-runc-1.0
Revert "Update runc to 1.0.0"
2021-07-06 10:42:56 -07:00
Artyom Lukianov
bb6d5b1f95 memory manager: provide unittests for init containers re-use
- provide tests for static policy allocation, when init containers
requested memory bigger than the memory requested by app containers
- provide tests for static policy allocation, when init containers
requested memory smaller than the memory requested by app containers
- provide tests to verify that init containers removed from the state
file once the app container started

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Artyom Lukianov
960da7895c memory manager: remove init containers once app container started
Remove init containers from the state file once the app container started,
it will release the memory allocated for the init container and can intense
the density of containers on the NUMA node in cases when the memory allocated
for init containers is bigger than the memory allocated for app containers.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Artyom Lukianov
b965502c49 memory manager: re-use the memory allocated for init containers
The idea that during allocation phase we will:

- during call to `Allocate` and `GetTopologyHints`  we will take into account the init containers reusable memory,
which means that we will re-use the memory and update container memory blocks accordingly.
For example for the pod with two init containers that requested: 1Gi and 2Gi,
and app container that requested 4Gi, we can re-use 2Gi of memory.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-07-05 20:52:25 +03:00
Odin Ugedal
61d88af9e4 Revert "Update runc to 1.0.0" 2021-07-05 14:03:04 +02:00
Kir Kolyshkin
ab5b77944e kubelet/cm: don't set Devices
Since runc 1.0.0 it is now sufficient to have SkipDevices: true.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-30 16:17:35 -07:00
Kubernetes Prow Robot
07358f1663 Merge pull request #103146 from tech-geek29/fix-95380
Change log level to Debug
2021-06-25 07:44:45 -07:00
Rishabh Jain
8f08db9164 Change log level to Debug 2021-06-24 14:23:06 +05:30
Kenta Tada
89a4d4b071 kubelet: modify the function of getCgroupSubsystemsV2 to use libcontainer API 2021-06-24 16:58:05 +09:00
Kubernetes Prow Robot
985ac8ae50 Merge pull request #101030 from cynepco3hahue/pod_resources_memory_interface
Extend pod resource API response to return the information from memory manager
2021-06-22 06:35:58 -07:00
Artyom Lukianov
03830db82d Implement all necessary methods to provide memory manager data under pod resources metrics
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 13:06:32 +03:00
Kubernetes Prow Robot
3bd29bc53d Merge pull request #102829 from snowplayfire/update-devicemanager
Add resource capacity to ListAndWatch grpc logging
2021-06-21 16:28:09 -07:00
jingxueli
45d18acbcc add info for possible failed listAndWatch grpc call 2021-06-17 16:25:20 +08:00
Kubernetes Prow Robot
85f0931ab9 Merge pull request #102772 from saintube/patch-1
cleanup: fix kubelet cpuset typo
2021-06-14 19:00:13 -07:00
Francesco Romani
369416b763 cm: handle nil cpumanager avoiding segfault
If the cpumanager feature gate is disabled, the corresponsing field
of the containerManager will be nil.
A couple functions don't check for this occurrence and happily
deference the pointer unconditionally, leading to possible segfaults.

The relevant functions were introduced to support the podresources API,
so to trigger this segfault all the following are needed:
- cpumanager feature gate has to be disabled explicitely
- any podresources API must be called

Worth pointing out that when the new functions were introduced (around
kubernetes 1.20) the default feature gate for cpumanager was already set
to true, hence this bug is expected to be triggered rarely.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-06-10 16:22:43 +02:00
Frame
9255f2ccf3 Fix kubelet cpuset typo 2021-06-10 18:17:04 +08:00
Kubernetes Prow Robot
1795a98eeb Merge pull request #102221 from kikimo/add-hint-to-fake-topology-manager
Add hint to fake topology manager.
2021-06-02 03:40:05 -07:00
kikimo
86d68effc2 clean code 2021-06-02 09:07:53 +08:00
Kubernetes Prow Robot
7c7a0865cd Merge pull request #102218 from kolyshkin/cgroup-cleanups
pkg/kubelet/cm: cgroup-related cleanups
2021-06-01 13:45:51 -07:00
kikimo
9d2135f703 reuse fake topology manager 2021-06-02 01:35:00 +08:00
kikimo
8b3162d67b clean code 2021-06-02 01:17:04 +08:00
sanwishe
9e257ec194 Optimization logging format for pkg/kubelet
Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>
2021-05-25 08:52:08 +08:00
Kubernetes Prow Robot
cf59c68e15 Merge pull request #102088 from wzshiming/fix/pod-devices-has-pod-lock
Add the missing RLock
2021-05-24 15:16:20 -07:00
Kir Kolyshkin
f1aee7e049 kubelet/cm: GetResourceStats -> MemoryUsage
Commit cc50aa9dfb introduced GetResourceStats, a method which collected
all the statistics from various cgroup controllers, only to discard all
of the info collected except a single value (memory usage).

While one may argue that this method can potentially be used from other
places, this did not happen since it was added 4+ years ago.

Let's streamline this code and only collect what we need, i.e. memory
usage. Rename the method accordingly.

While at it, fix pkg/kubelet/cm/cgroup_manager_unsupported.go to not
instantiate a new error every time a method is called.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-23 20:43:52 -07:00
kikimo
20c02357ca Add hint to fake topology manager. 2021-05-22 15:29:08 +08:00
Kir Kolyshkin
c299b8fc9a kubelet/cm: rm propagateControllers
This was added by commit a9772b2290.

In the current codebase, the cgroup being updated was created using
runc/opencontainers' manager.Apply(), which already does controllers
propagation, so there is no need to repeat that on every update.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-21 13:44:54 -07:00
Kubernetes Prow Robot
cb1775c73a Merge pull request #101893 from kikimo/fix-numa-topo-error
Avoid undesirable allocation when device is associated with multiple …
2021-05-21 08:40:47 -07:00
Kubernetes Prow Robot
5de1a754c8 Merge pull request #102147 from kolyshkin/update-runc-rc94-take-II
vendor: bump runc to rc95
2021-05-20 17:16:56 -07:00
Kubernetes Prow Robot
823d870725 Merge pull request #102014 from klueska/upstream-update-cpu-asssignment-algorithm
Refactor the algorithm used to decide CPU assignments in the CPUManager
2021-05-20 16:10:56 -07:00
Kubernetes Prow Robot
e259943f7f Merge pull request #101265 from s-ito-ts/ut_kubelet_topology
Adds unit tests for pkg/kubelet/cm/cpumanager/topology
2021-05-20 14:16:28 -07:00
kikimo
c0a7939cbb remove redundant test branch in sorting algorithm 2021-05-20 20:31:47 +08:00
Rancho Chen
9469ee7025 Add testcase for freeCPUs with three Sockets 2021-05-20 11:49:51 +00:00
Odin Ugedal
d312ef7eb6 Set cgroups via opencontainer
This sets cgroup config via libcontainer to make sure we apply the
correct values to the systemd slices and scopes.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:52:01 -07:00
Kir Kolyshkin
f3cdfc488e vendor: bump runc to rc95
runc rc95 contains a fix for CVE-2021-30465.

runc rc94 provides fixes and improvements.

One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

        github.com/cilium/ebpf v0.5.0
        github.com/containerd/console v1.0.2
        github.com/coreos/go-systemd/v22 v22.3.1
        github.com/godbus/dbus/v5 v5.0.4
        github.com/moby/sys/mountinfo v0.4.1
        golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
        github.com/google/go-cmp v0.5.4
        github.com/kr/pretty v0.2.1
        github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-19 23:51:59 -07:00
Giuseppe Scrivano
12abc3b7c9 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-19 23:51:49 -07:00
kikimo
445b9c0762 minor tweak on numa node sorting algorithm 2021-05-20 08:21:20 +08:00
kikimo
ecfa609b71 simplify sorting comparator of numa nodes 2021-05-19 21:19:47 +08:00
kikimo
84a4b40526 fix incompatible interface in fakeTopologyManagerWithHint 2021-05-19 10:12:12 +08:00
kikimo
7d30bfecd5 simplify sorting comparator of numa nodes 2021-05-19 10:07:37 +08:00
kikimo
893ebf3a1c add a reusable fakeTopologyManagerWithHint{} 2021-05-19 10:07:37 +08:00
kikimo
2ef1f81076 Avoid undesirable allocation when device is associated with multiple NUMA Nodes
suppose there are two devices dev1 and dev2, each has NUMA Nodes associated as below:
  dev1: numa1
  dev2: numa1, numa2

and we request a device from numa2, currently filterByAffinity() will return
[], [dev1, dev2], [] if loop of available devices produce a sequence of [dev1, dev2],
that is is not desirable as what we truely expect is an allocation of dev2 from numa2.
2021-05-19 10:07:37 +08:00
Jordan Liggitt
4b45d0d921 Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94"
This reverts commit b1b06fe0a4, reversing
changes made to 382a33986b.
2021-05-18 09:13:47 -04:00
Shiming Zhang
bbed9d27b0 Add the missing RLock 2021-05-18 17:27:27 +08:00
Kubernetes Prow Robot
003dd87cff Merge pull request #100565 from lack/cpuset-validation
cpuset parsing:Fix more edge cases and add more unit tests
2021-05-17 13:39:30 -07:00
Kevin Klues
67c92a5cd4 Refactor / simplify logic for CPU assignment algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-05-14 14:53:06 +00:00
s-ito-ts
1dea66439c Adds unit tests for pkg/kubelet/cm/cpumanager/topology 2021-05-12 07:13:04 +00:00
Kir Kolyshkin
b49744f177 vendor: bump runc to rc94
One notable change is cgroup manager's Set now accept Resources rather
than Cgroup (see https://github.com/opencontainers/runc/pull/2906).
Modify the code accordingly.

Also update runc dependencies (as hinted by hack/lint-depdendencies.sh):

	github.com/cilium/ebpf v0.5.0
	github.com/containerd/console v1.0.2
	github.com/coreos/go-systemd/v22 v22.3.1
	github.com/godbus/dbus/v5 v5.0.4
	github.com/moby/sys/mountinfo v0.4.1
	golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
	github.com/google/go-cmp v0.5.4
	github.com/kr/pretty v0.2.1
	github.com/opencontainers/runtime-spec v1.0.3-0.20210326190908-1c3f411f0417

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-05-11 11:56:42 -07:00
Jim Ramsay
a21179ae69 cpuset.Parse: Fix edge cases and add negative tests
The cpuset.Parse function missed a couple bad input cases, specifically
"1--3" and "10-6".  These were silently ignored when they should instead
be flagged as invalid.

This now catches these cases and expands the unit tests for cpuset to
cover them (and other negative test cases as well).

Signed-off-by: Jim Ramsay <jramsay@redhat.com>
2021-05-11 11:05:38 -04:00
Giuseppe Scrivano
fd7ecd3915 kubelet: reuse manager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-10 17:34:53 -07:00
Kubernetes Prow Robot
160425640e Merge pull request #101771 from klueska/upstream-only-uppdate-if-needed
Add logic to only call CPUManager Update() if state different than last Update()
2021-05-10 09:45:09 -07:00