Commit Graph

197 Commits

Author SHA1 Message Date
Francesco Romani
23abdab2b7 smtalign: propagate policy options to policies
Consume in the static policy the cpu manager policy options from
the cpumanager instance.
Validate in the none policy if any option is given, and fail if so -
this is almost surely a configuration mistake.

Add new cpumanager.Options type to hold the options and translate from
user arguments to flags.

Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:15:37 +02:00
Francesco Romani
c5cb263dcf smtalign: propagate policy options to cpumanager
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.

We defer the consumption of the options to a later patch(set).

Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-07-08 23:15:35 +02:00
Rishabh Jain
8f08db9164 Change log level to Debug 2021-06-24 14:23:06 +05:30
Kubernetes Prow Robot
823d870725 Merge pull request #102014 from klueska/upstream-update-cpu-asssignment-algorithm
Refactor the algorithm used to decide CPU assignments in the CPUManager
2021-05-20 16:10:56 -07:00
Rancho Chen
9469ee7025 Add testcase for freeCPUs with three Sockets 2021-05-20 11:49:51 +00:00
Kevin Klues
67c92a5cd4 Refactor / simplify logic for CPU assignment algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-05-14 14:53:06 +00:00
s-ito-ts
1dea66439c Adds unit tests for pkg/kubelet/cm/cpumanager/topology 2021-05-12 07:13:04 +00:00
Kevin Klues
6646039481 Add logic to only call Update() if state different than last Update()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-05-06 23:38:08 +02:00
Kubernetes Prow Robot
f72410d4c6 Merge pull request #100067 from changshuchao/testcase_status
Add test case for state.go
2021-04-08 17:11:21 -07:00
Elana Hashman
6af7eb6d49 Migrate missed log entries in kubelet
Co-Authored-By: pacoxu <paco.xu@daocloud.io>
2021-03-18 14:26:26 -07:00
Kubernetes Prow Robot
1d4777b798 Merge pull request #100163 from lala123912/kubelet_log_3
Migrate pkg/kubelet/cm/cpumanage/{topology/togit pology.go, policy_none.go, cpu_assignment.go} to structured logging
2021-03-16 15:57:08 -07:00
Kubernetes Prow Robot
1cd909606d Merge pull request #100176 from pacoxu/structured-log-kubelet-last
Kubelet migration to structured logs: cpumanager/{cpu_manager.go\fake_cpu_manager.go\policy_static.go)
2021-03-16 14:50:31 -07:00
Kubernetes Prow Robot
a951e877be Merge pull request #99563 from jmguzik/migrate-cm-cpumanager-state-structured-logging
Migrate pkg/kubelet/cm/cpumanager/state to structured logging
2021-03-16 14:48:55 -07:00
pacoxu
8d24c8d0ab update structured log for cpumanager/cpu_manager.go 2021-03-16 09:40:53 +08:00
lala123912
b247240ad7 Migrate pkg/kubelet/cm/cpumanage/{topology/topology.go, policy_none.go, cpu_assignment.go} to structured logging 2021-03-15 09:42:07 +08:00
pacoxu
9e024e839b update structured log for policy_static.go 2021-03-12 16:26:20 +08:00
pacoxu
4cf80f160d update structured log for fake_cpu_manager.go 2021-03-12 16:06:52 +08:00
changshuchao
bf18a1ca53 Add test case for state.go 2021-03-11 17:06:56 +08:00
Francesco Romani
6d33354e4c node: podresources: implement GetAllocatableResources API
Extend the podresources API implementing the GetAllocatableResources endpoint,
as specified in the KEPs:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
https://github.com/kubernetes/enhancements/pull/2404

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Kubernetes Prow Robot
ef44d39be0 Merge pull request #99464 from Nordix/master-fix
Number of sockets is assumed to be same as NUMA nodes in kubelet
2021-03-03 14:41:21 -08:00
Jakub Guzik
85d69cde82 Migrate pkg/kubelet/cm/cpumanager/state to structured logging
Signed-off-by: Jakub Guzik <jakubmguzik@gmail.com>
2021-03-03 01:18:37 +01:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Hanamantagoud
549b615439 Number of sockets is assumed to be same as NUMA nodes 2021-02-26 16:22:50 +05:30
Nikhita Raghunath
c3c45b9b8c *: move balajismaniam to emeritus_approvers 2021-02-16 10:55:47 +05:30
Kubernetes Prow Robot
e6e079aac3 Merge pull request #97748 from heqg/collides-state
Fix variable 'state' collides with imported package name
2021-01-28 17:51:40 -08:00
Artyom Lukianov
38dc7509f8 cpu manager: specify the container CPU set during the creation
We can set the container cpuset.cpus diring the creation and it
will not need to call to update resources after the container creation.

Additional side effect of the change, that the runc process that responsible
to create the container will run with the same CPU affinity because the
runc runs on the cpuset provided in the config.json arg.

It will allow to prevent undesirable interupts on isolated CPUs.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-01-20 17:53:33 +02:00
Artyom Lukianov
60678a24ca Update CPU manager GetCPUs method to return pointer to CPUSet 2021-01-20 13:21:57 +02:00
Kubernetes Prow Robot
04b6b7c12b Merge pull request #97787 from heqg/expect-policy_static_test
fix typo of [expect] in pkg/kubelet/../policy_static_test.go
2021-01-07 00:53:45 -08:00
he.qingguo
d9368f53ad fix typo of [expect] in pkg/kubelet/../policy_static_test.go
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-07 12:20:03 +08:00
Kubernetes Prow Robot
e456b45a2a Merge pull request #97749 from heqg/errorf-wrap
The code in TestNonePolicyName does not need to wrap, so fix it.
2021-01-06 12:02:04 -08:00
he.qingguo
8826d12bb0 The code in TestNonePolicyName does not need to wrap, so fix it.
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-06 10:48:30 +08:00
he.qingguo
8249cd611d Fix variable 'state' collides with imported package name
Signed-off-by: he.qingguo <he.qingguo@zte.com.cn>
2021-01-06 10:31:47 +08:00
Kubernetes Prow Robot
10c1c3acf6 Merge pull request #96906 from Rajalakshmi-Girish/issue-96853
Fixes the unit tests to be more tolerant with error messages
2021-01-05 17:09:51 -08:00
Rajalakshmi-Girish
98948ad809 fixes the unit tests to be more tolerant with error messages 2020-12-24 04:47:46 +00:00
Kevin Klues
2fcbd2206d Fix bug in CPUManager with race on map acccess
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-12-21 19:11:53 +00:00
Krzysztof Wiatrzyk
b7714918db Run ./update-all.sh
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
7ad65bf22d Add tests for GetPodTopologyHints() for cpumanager
* Add tests for getPodRequestedCPU()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
35b1f28d0f Refactor topology hints tests for cpumanager
* Extract common tests cases that will be used for both GetTopologyHints()
and GetPodTopologyHints()
* Extract machineInfo as it will be used for both functions as well

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
27b7bcb41c Implement the cpumanager.GetPodTopologyHints() function
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
6db58b2e92 Update logging to use a format util
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
f5997fe537 Add GetPodTopologyHints() interface to Topology/CPU/Device Manager
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Jordan Liggitt
7268b1d557 Deflake cpumanager checkpoint unit tests 2020-09-04 15:06:04 -04:00
Alexey Perevalov
a047e8aa1b move to cadvisor.MachineInfo
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-24 09:29:41 -04:00
Alexey Perevalov
e33ba9e974 Avoid using socket for hints
Sockets don't affect performance as NUMA node does, since NUMA
node has dedicated memory controller, but socket it's physical
extension point.
Socket it's only cpu specific thing and it's strange to merge bitmask of
deviceplugin's and cpu manager, when cpu manager takes into account
socket.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-22 05:14:34 -04:00
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Seth Jennings
45d2b98aa8 add sjenning as kubelet approver 2020-06-19 13:00:55 -05:00
Davanum Srinivas
07d88617e5 Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
7fdc1275d9 Merge pull request #90377 from cbf123/container_cpuset_fixup_2
Fix exclusive CPU allocations being deleted at container restart
2020-04-27 13:40:04 -07:00