kubernetes

Author	SHA1	Message	Date
vinay kulkarni	01b96e7704	Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources	2023-03-10 14:49:26 +00:00
Kubernetes Prow Robot	efe20f6c9b	Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537 node: cpumgr: stricter pre-check for the policy option full-pcpus-only	2023-03-02 09:04:56 -08:00
Francesco Romani	0e9b92090c	node: cpumgr: stricter precheck for full-pcpus-only In order to implement the `full-pcpus-only` cpumanager policy option, we leverage the implementation of the algorithm which picks CPUs. By design, CPUs are taken from the biggest chunk available (socket or NUMA zone) to physical cores, down to single cores. Leveraging this, if the requested CPU count is a multiple of the SMT level (commonly 2), we're guaranteed that only full physical cores will be taken. The hidden assumption here is this holds true by construction iff the user reserved CPUs (if any) considering full physical CPUs. IOW, if the user did intentionally or mistakely reserve single threads which are no core siblings[1], then the simple check we implemented is not sufficient. A easy example can probably outline this better. With this setup: cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings). SMT level: 2 (each tuple is 2 elements) Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`) A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed. The CPU allocator will take first full cores, (2,6) and (3,8), and will then pick the remaining single CPUs. The allocation will succeed, but it's incorrect. We can fix this case with a stricter precheck. We need to additionally consider all the core siblings of the reserved CPUs as unavailable when computing the free cpus, before to start the actual allocation. Doing so, we fall back in the intended behavior, and by construction all possible CPUs allocation whose number is multiple of the SMT level are now correct again. +++ [1] or thread siblings in the linux parlance, in any case: hyperthread siblings of the same physical core Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-03-02 16:00:58 +01:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ian K. Coolidge	f3829c4be3	cpuset: Rename 'NewCPUSet' to 'New'	2023-01-06 23:32:51 +00:00
Ian K. Coolidge	e5143d16c2	cpuset: Make 'ToSlice*' methods look like 'set' methods In 'set', conversions to slice are done also, but with different names: ToSliceNoSort() -> UnsortedList() ToSlice() -> List() Reimplement List() in terms of UnsortedList to save some duplication.	2023-01-06 23:32:51 +00:00
Ian K. Coolidge	824bd57ad6	cpuset: Convert Union arguments to variadic This allows Union to implement UnionAll easily.	2023-01-06 23:32:50 +00:00
Francesco Romani	5e12338a22	node: cpumgr: address `golint` complains Add docstrings and trivial fixes. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-11-02 18:41:42 +01:00
Kubernetes Prow Robot	d0e86111ef	Merge pull request #112855 from fromanirh/cpumanager-metrics node: metrics: cpumanager: add metrics about pinning	2022-10-31 03:12:56 -07:00
Francesco Romani	47d3299781	node: metrics: cpumanager: add pinning metrics In order to improve the observability of the cpumanager, add and populate metrics to track if the combination of the kubelet configuration and podspec would trigger exclusive core allocation and pinning. We should avoid leaking any node/machine specific information (e.g. core ids, even though this is admittedly an extreme example); tracking these metrics seems to be a good first step, because it allows us to get feedback without exposing details. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-10-27 14:40:40 +02:00
Garrybest	d446f5f90e	fix GetAllocatableCPUs in cpumanager Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-10-27 19:57:06 +08:00
Arpit Singh	d92fd8392d	Adding unit test for align-by-socket policy option Also addressed MR comments as part of same commit.	2022-08-02 11:02:07 -07:00
Arpit Singh	06f347f645	Adding validity checks for topology manager align-by-socket	2022-08-02 11:02:07 -07:00
Arpit Singh	35849bf7fb	KEP-3327: Add CPUManager policy option to align CPUs by Socket instead of by NUMA node	2022-08-02 11:02:07 -07:00
Davanum Srinivas	a9593d634c	Generate and format files - Run hack/update-codegen.sh - Run hack/update-generated-device-plugin.sh - Run hack/update-generated-protobuf.sh - Run hack/update-generated-runtime.sh - Run hack/update-generated-swagger-docs.sh - Run hack/update-openapi-spec.sh - Run hack/update-gofmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-07-26 13:14:05 -04:00
Kubernetes Prow Robot	03ee86c09c	Merge pull request #104837 from eggiter/fix-release-reused-cpus fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers	2022-01-06 11:46:38 -08:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
eggiter	20d3bc32ac	fix(cpumanager): Do not release cpus of init containers while they are reused in app containers	2021-09-10 10:01:35 +08:00
Francesco Romani	23abdab2b7	smtalign: propagate policy options to policies Consume in the static policy the cpu manager policy options from the cpumanager instance. Validate in the none policy if any option is given, and fail if so - this is almost surely a configuration mistake. Add new cpumanager.Options type to hold the options and translate from user arguments to flags. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
pacoxu	8d24c8d0ab	update structured log for cpumanager/cpu_manager.go	2021-03-16 09:40:53 +08:00
pacoxu	9e024e839b	update structured log for policy_static.go	2021-03-12 16:26:20 +08:00
Francesco Romani	6d33354e4c	node: podresources: implement GetAllocatableResources API Extend the podresources API implementing the GetAllocatableResources endpoint, as specified in the KEPs: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments https://github.com/kubernetes/enhancements/pull/2404 Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-03-09 13:13:36 +01:00
sw.han	27b7bcb41c	Implement the cpumanager.GetPodTopologyHints() function Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk	6db58b2e92	Update logging to use a format util Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:55 +01:00
sw.han	f5997fe537	Add GetPodTopologyHints() interface to Topology/CPU/Device Manager Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:54 +01:00
Alexey Perevalov	e33ba9e974	Avoid using socket for hints Sockets don't affect performance as NUMA node does, since NUMA node has dedicated memory controller, but socket it's physical extension point. Socket it's only cpu specific thing and it's strange to merge bitmask of deviceplugin's and cpu manager, when cpu manager takes into account socket. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2020-07-22 05:14:34 -04:00
Kevin Klues	00df26a985	Fix a bug whereby reusable CPUs and devices were not being honored Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled. This would happen because hint generation for the TopologyManager was not considering the reusable devices when it made its hint calculation. As such, it would sometimes: 1) Generate a hint for a differnent NUMA node, causing the CPUs and devices to be allocated from that node instead of the one where the reusable devices live; or 2) End up thinking there were not enough CPUs or devices to allocate and throw a TopologyAffinity admission error This patch fixes this by ensuring that reusable CPUs and devices are considered as part of TopologyHint generation. This frunctionality is difficult to unit test since it spans multiple components, but an e2e test will be added in a subsequent patch to test this functionality.	2020-07-20 11:41:13 +00:00
Davanum Srinivas	442a69c3bd	switch over k/k to use klog v2 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-05-16 07:54:27 -04:00
Kevin Klues	751b9f3e13	Update strategy used to reuse CPUs from init containers in CPUManager With the old strategy, it was possible for an init container to end up running without some of its CPUs being exclusive if it requested more guaranteed CPUs than the sum of all guaranteed CPUs requested by app containers. Unfortunately, this case was not caught by our unit tests because they didn't validate the state of the defaultCPUSet to ensure there was no overlap with CPUs assigned to containers. This patch updates the strategy to reuse the CPUs assigned to init containers across into app containers, while avoiding this edge case. It also updates the unit tests to now catch this type of error in the future.	2020-04-23 20:27:43 +00:00
nolancon	467f66580b	CPU Manager - Add check to policy.Allocate() for init conatiners If container allocated CPUs is an init container, release those CPUs back into the shared pool for re-allocation to next container.	2020-02-27 07:24:33 +00:00
nolancon	709989efa2	CPU Manager - Rename policy.AddContainer() to policy.Allocate()	2020-02-27 07:24:33 +00:00
Kevin Klues	bc686ea27b	Update TopologyManager.GetTopologyHints() to take pointers Previously, this function was taking full Pod and Container objects unnecessarily. This commit updates this so that they will take pointers instead.	2020-02-03 17:13:28 +00:00
whypro	f4bd4e2e96	Return error instead of panic when cpu manager starts failed.	2019-12-19 21:56:23 +08:00
Kevin Klues	185e790f71	Update CPUManager policies to adhere to new state semantics	2019-12-11 23:02:51 +01:00
Kevin Klues	765aae93f8	Move containerMap out of static policy and into top-level CPUManager	2019-12-11 23:02:51 +01:00
Kevin Klues	1d995c98ef	Update CPUmanager containerMap to allow removal by containerRef	2019-12-11 23:02:47 +01:00
Kevin Klues	0639bd0942	Change CPUManager containerMap to key off of (podUID, containerName) Previously it keyed off of a pointer to the actual pod / container, which was unnecessary, and hard to work with (especially on the retrieval side).	2019-12-11 23:02:11 +01:00
Kevin Klues	3881e50cce	Update CPUmanager containerMap to also return a containerRef	2019-12-11 23:01:01 +01:00
Kevin Klues	347d5f57ac	Move CPUManager ContainerMap to its own package	2019-12-11 22:59:00 +01:00
Kubernetes Prow Robot	73b2c82b28	Merge pull request #83592 from jianzzha/opt-reserved-cpus added --reserved-cpus kubelet command option	2019-11-06 22:14:42 -08:00
Jianzhu Zhang	89dfd24483	added --reserved-cpus kubelet command option	2019-11-06 07:33:52 -05:00
Kevin Klues	9dc116eb08	Ensure CPUManager TopologyHints are regenerated after kubelet restart This patch also includes test to make sure the newly added logic works as expected.	2019-11-05 15:48:51 +00:00
Connor Doyle	389853894d	Delegate topology hint gen to CPU manager policy - The previous implementation depended on a fixed set of policies.	2019-09-27 22:29:02 -07:00
Connor Doyle	e35301c19f	Rename package socketmask to bitmask. - As discussed in reviews and other public channels, this abstraction is used to represent numa nodes, not sockets. - There is nothing inherently related to sockets in this package anyway.	2019-09-23 17:08:45 -07:00
Kevin Klues	e0e8b3e4fd	Update CPUManager topology helpers to accept multiple ids	2019-08-29 13:22:54 -05:00
Kevin Klues	f4dbd29cdb	Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity As part of this, update the logic to use the NUMA information instead of the Socket information when generating and consuming TopologyHints in the CPUManager.	2019-08-27 16:51:05 -05:00
Kevin Klues	8278d1134c	Consume TopologyHints in the CPUManager Co-Authored-By: Conor Nolan <conor.nolan@intel.com>	2019-08-14 06:22:56 +02:00
Kevin Klues	156b3f6af8	Generate TopologyHints from the CPUManager	2019-08-14 06:22:56 +02:00

1 2

61 Commits