Commit Graph

1418 Commits

Author SHA1 Message Date
Artyom Lukianov
4c75be0604 memory manager: provide the skeleton for the memory manager
Provide memory manager struct and methods that should be implemented.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-02-09 00:54:58 +02:00
Ryan Phillips
f918e11e3a register all pending pod deletions and check for kill
do not delete the cgroup from a pod when it is being killed
2021-02-04 11:45:42 -06:00
Kubernetes Prow Robot
e6e079aac3 Merge pull request #97748 from heqg/collides-state
Fix variable 'state' collides with imported package name
2021-01-28 17:51:40 -08:00
Kubernetes Prow Robot
889cf714c1 Merge pull request #95111 from choury/patch-2
make podTopologyHints protected by lock
2021-01-26 04:18:34 -08:00
choury
fe089a2d12 make podTopologyHints protected by lock
It crashed kubelet by "concurrent map read and map write"
2021-01-26 10:36:05 +08:00
Kubernetes Prow Robot
06a7e2bacf Merge pull request #96781 from fighterhit/fix-kukelet-device-plugin-bug
Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource
2021-01-25 17:59:00 -08:00
pacoxu
89c42bd3d5 check containerd as process name instead of docker-containerd
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-01-23 10:55:18 +08:00
b10s
de60340e51 Improve the getCgroupSubsystemsV1() which uses random record per subsystem
returned by libcontainercgroups.GetCgroupMounts().

Example array from GetCgroupMounts():
```
[
        {
                Mountpoint: "/sys/fs/cgroup/systemd",
                Root: "/",
                Subsystems: []string len: 1, cap: 1, ["systemd"],},
        {
                Mountpoint: "/sys/fs/cgroup/cpu,cpuacct",
                Root: "/",
                Subsystems: []string len: 2, cap: 2, ["cpu","cpuacct"],},
        {
                Mountpoint: "/sys/fs/cgroup/systemd/some/path",
                Root: "/some/path",
                Subsystems: []string len: 1, cap: 1, ["systemd"],},
]
```
becames a map:
```
[
        "memory": "/sys/fs/cgroup/memory/kubepods",
        "systemd": "/sys/fs/cgroup/systemd/some/path",
]
```
which seems to be wrong.

Using shortest path of mountpoint per subsystem would be more reliable.

reference issue: https://github.com/kubernetes/kubernetes/issues/95488
2021-01-22 22:21:46 +09:00
Artyom Lukianov
38dc7509f8 cpu manager: specify the container CPU set during the creation
We can set the container cpuset.cpus diring the creation and it
will not need to call to update resources after the container creation.

Additional side effect of the change, that the runc process that responsible
to create the container will run with the same CPU affinity because the
runc runs on the cpuset provided in the config.json arg.

It will allow to prevent undesirable interupts on isolated CPUs.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-01-20 17:53:33 +02:00
Artyom Lukianov
60678a24ca Update CPU manager GetCPUs method to return pointer to CPUSet 2021-01-20 13:21:57 +02:00
Artyom Lukianov
69db36b958 Provide additional methods under the CPUSet
- ToSliceInt64 returns sorted slice of cores IDs in int64 format
- ToSliceNoSortInt64 returns slice of cores IDs in int64 format

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-01-20 13:21:57 +02:00
fighterhit
16c6b99fcd del unused value 2021-01-13 12:43:54 +08:00
fighterhit
24dd9b1f04 add a test to demonstrate PR#96781 2021-01-13 11:27:30 +08:00
Kubernetes Prow Robot
04b6b7c12b Merge pull request #97787 from heqg/expect-policy_static_test
fix typo of [expect] in pkg/kubelet/../policy_static_test.go
2021-01-07 00:53:45 -08:00
he.qingguo
d9368f53ad fix typo of [expect] in pkg/kubelet/../policy_static_test.go
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-07 12:20:03 +08:00
Kubernetes Prow Robot
e456b45a2a Merge pull request #97749 from heqg/errorf-wrap
The code in TestNonePolicyName does not need to wrap, so fix it.
2021-01-06 12:02:04 -08:00
he.qingguo
8826d12bb0 The code in TestNonePolicyName does not need to wrap, so fix it.
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-06 10:48:30 +08:00
he.qingguo
8249cd611d Fix variable 'state' collides with imported package name
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-06 10:31:47 +08:00
Kubernetes Prow Robot
10c1c3acf6 Merge pull request #96906 from Rajalakshmi-Girish/issue-96853
Fixes the unit tests to be more tolerant with error messages
2021-01-05 17:09:51 -08:00
Kubernetes Prow Robot
4dc3a42712 Merge pull request #97427 from klueska/upstream-fix-cpumanager-race
Fix bug in CPUManager with race on container map access
2021-01-05 11:46:28 -08:00
Kubernetes Prow Robot
b37e9a440e Merge pull request #97193 from JornShen/flaky_devicemanager_test
[flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode failed
2021-01-05 11:46:21 -08:00
Anthony ARNAUD
6013aaa370 use Lstat instead of Stat for unix socket on windows 2020-12-29 15:14:29 -05:00
Rajalakshmi-Girish
98948ad809 fixes the unit tests to be more tolerant with error messages 2020-12-24 04:47:46 +00:00
Kevin Klues
2fcbd2206d Fix bug in CPUManager with race on map acccess
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-12-21 19:11:53 +00:00
Anthony ARNAUD
8bdc3d8970 Port deviceManager in windows container manager 2020-12-16 00:25:26 -05:00
jornshen
93606f8ba3 [flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode fail 2020-12-10 21:07:49 +08:00
Seth Jennings
ee60ee26e0 kubelet: remove periodic messages from log-level 2 2020-11-30 11:34:00 -06:00
fighterhit
0eaceb7eb5 Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource 2020-11-21 22:44:27 +08:00
Alexey Perevalov
5e6aed4137 Fixes sigfault in case of empty TopologyInfo
Device plugin which implements v1beta interface can return nil in
Topology field

For example nvidia-gpu-deviceplugin
3520254b75/nvidia.go (L147)
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-13 11:51:47 +03:00
Krzysztof Wiatrzyk
b7714918db Run ./update-all.sh
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
7b0ccaa1e9 Add tests for getPodDeviceRequest() for devicemanager
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
ba1e8abce7 Add tests for GetPodTopologyHints() for devicemanager
* Add additional test cases returned by getPodScopeTestCases()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
1c4a1ba6ae Update topology hints tests to use pod object for devicemanager
Pod object is more flexible to use and construct
* Update TestGetTopologyHints() to work according to new test cases
* Update topologyHintTestCase{} to include proper field

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
7ad65bf22d Add tests for GetPodTopologyHints() for cpumanager
* Add tests for getPodRequestedCPU()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
35b1f28d0f Refactor topology hints tests for cpumanager
* Extract common tests cases that will be used for both GetTopologyHints()
and GetPodTopologyHints()
* Extract machineInfo as it will be used for both functions as well

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
656a08afdf Move scope specific tests from topologymanager under particular scopes
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
c786c9a533 Move common tests from topologymanager under scope
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
f5c0fe4ef6 Update topologymanager tests after adding scopes
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Byonggon Chun
9da0912a33 Implement devicemanager.GetPodLevelTopologyHints() function
* Add podDevices() func
* Add getPodDeviceRequest() func

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
27b7bcb41c Implement the cpumanager.GetPodTopologyHints() function
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
6db58b2e92 Update logging to use a format util
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
b2be584e5b Implement topology manager scopes
* Add topologyScopeName parameter to NewManager().
* Add scope interface and structure that implement common logic
* Add pod scope & container scopes
* Add pod lifecycle functions

Co-authored-by: sw.han <sw.han@samsung.com>

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
sw.han
f5997fe537 Add GetPodTopologyHints() interface to Topology/CPU/Device Manager
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
sw.han
d070bff273 Add kubelet configuration flag 'topology-manager-scope'
add kubelet config option.
* --topology-manager-scope=[ container | pod ]
* default=container

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Alexey Perevalov
62326a1846 Convert podDevices to struct
PodDevices will have its own guard

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:48 +03:00
Alexey Perevalov
9f54dccc92 Change GetDevices interface
This change is necessary for supporting Topology in the ContainerDevices.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 12:41:31 +03:00
Kubernetes Prow Robot
bbc26ba7e6 Merge pull request #96048 from rphillips/fixes/device_plugin_stub_race
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed
2020-11-02 08:20:54 -08:00
Kubernetes Prow Robot
332d17c7f5 Merge pull request #95731 from farah/split-scheduler
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-30 11:14:22 -07:00
Ryan Phillips
4fdfbc718c devicemanager: fix race in stub
There is a race when the server is coming up and the subsequent dial on
the socket. Fix the race with a PollImmediate retry.
2020-10-30 11:42:01 -05:00