kubernetes

Author	SHA1	Message	Date
Francesco Romani	077c0aa1be	node: graduate CPUManagerPolicyOptions to beta We graduate the `CPUManagerPolicyOptions` feature to beta in the 1.23 cycle, and we add new experimental feature gates to guard new options which are planned in the 1.23 and in the following cycles. We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions`. The basic idea is to avoid the cumbersome process of adding a feature gate for each option, and to have feature gates which track the maturity level of _groups_ of options. Besides this change, the graduation process, and the process in general, for adding new policy options is still unchanged. The `full-pcpus-only` option added in the 1.22 cycle is intentionally moved into the beta policy options For more details: - KEP: https://github.com/kubernetes/enhancements/pull/2933 - sig-arch discussion: https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-29 11:40:03 +02:00
Kubernetes Prow Robot	2541fcf256	Merge pull request #104123 from fromanirh/podresources-not-report-unhealthy-devices devicemanager: skip unhealthy devices in GetAllocatable	2021-09-23 05:39:21 -07:00
Francesco Romani	1b6efa5e21	devicemanager: skip unhealthy devs in GetAllocatable The GetAllocatableDevices, needed to support the podresources API, doesn't take into account the device health when computing its output. In this PR we address this gap and add unit tests along the way to prevent regressions. This gives us a good initial coverage, E2E tests to cover this case are much harder to write, because we would need to inject faults to trigger the unhealthy status. We will evaluate if adding these tests into later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-22 19:20:04 +02:00
Ricardo Pchevuzinske Katz	37d11bcdaf	Move node and networking related helpers from pkg/util to component helpers Signed-off-by: Ricardo Katz <rkatz@vmware.com>	2021-09-16 17:00:19 -03:00
KeZhang	a629ceeb58	Fix initContainersReusableMemory delete bug	2021-09-15 10:04:49 +08:00
eggiter	20d3bc32ac	fix(cpumanager): Do not release cpus of init containers while they are reused in app containers	2021-09-10 10:01:35 +08:00
Shiming Zhang	7706d3d281	pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair	2021-09-06 17:37:04 +08:00
Artyom Lukianov	9ea9798759	kubelet: memory manager: fix topology preferred topology hints calculation Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes. The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container requests. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-31 17:46:59 +03:00
tiloso	2b86541313	Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-08-26 08:50:08 +02:00
Kubernetes Prow Robot	cbd0611d49	Merge pull request #104528 from kolyshkin/runc-1.0.2 vendor: bump runc to 1.0.2	2021-08-25 18:17:23 -07:00
Stephen Augustus	481cf6fbe7	generated: Run hack/update-gofmt.sh Signed-off-by: Stephen Augustus <foo@auggie.dev>	2021-08-24 15:47:49 -04:00
Alexey Perevalov	bb81101570	podresource: do not export NUMA topology if it's empty If device plugin returns device without topology, keep it internaly as NUMA node -1, it helps at podresources level to not export NUMA topology, otherwise topology is exported with NUMA node id 0, which is not accurate. It's imposible to unveile this bug just by tracing json.Marshal(resp) in podresource client, because NUMANodes field ID has json property omitempty, in this case when ID=0 shown as emtpy NUMANode. To reproduce it, better to iterate on devices and just trace dev.Topology.Nodes[0].ID. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-08-24 15:38:21 +00:00
Kir Kolyshkin	c06a851042	pkg/kubelet/cm: use SkipFreezeOnSet This is a knob added by runc 1.0.2 specifically for kubernetes, which tells runc/libcontainer/cgroups/systemd v1 manager to not freeze the cgroup in Set(). We set this knob here because this code is only used for pods (rather than containers) management, and in this place we create or update the pod cgroup with no device limits set, so we can skip the freeze. If this knob is not set, libcontainer's cgroup v1 manager tries to figure out whether the freeze is needed or not, but it's a somewhat expensive check to perform, thus the knob is a shortcut. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-23 13:41:51 -07:00
Kubernetes Prow Robot	a9aad7e034	Merge pull request #103107 from pacoxu/fix-93300 ResourceConfigForPod: check initContainers as other QoS func	2021-08-17 11:41:37 -07:00
Artyom Lukianov	73a5cce3e6	device manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	93a237abd8	memory manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	66babd1a90	cpu manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Wesley Williams	ff165c8823	Replace usage of Whitelist with Allowlist within Kubelet's sysctl package (#102298 ) * Change uses of whitelist to allowlist in kubelet sysctl * Rename whitelist files to allowlist in Kubelet sysctl * Further renames of whitelist to allowlist in Kubelet * Rename podsecuritypolicy uses of whitelist to allowlist * Update pkg/kubelet/kubelet.go Co-authored-by: Danielle <dani@builds.terrible.systems> Co-authored-by: Danielle <dani@builds.terrible.systems>	2021-08-04 18:59:35 -07:00
Kir Kolyshkin	e5b434e990	kubelet/cm: don't set Devices Since runc 1.0.0 it is now sufficient to have SkipDevices: true. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-16 12:45:35 -07:00
Francesco Romani	23abdab2b7	smtalign: propagate policy options to policies Consume in the static policy the cpu manager policy options from the cpumanager instance. Validate in the none policy if any option is given, and fail if so - this is almost surely a configuration mistake. Add new cpumanager.Options type to hold the options and translate from user arguments to flags. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
Francesco Romani	6dcec345df	smtalign: cm: factor out admission response Introduce a new `admission` subpackage to factor out the responsability to create `PodAdmitResult` objects. This enables resource manager to report specific errors in Allocate() and to bubble up them in the relevant fields of the `PodAdmitResult`. To demonstrate the approach we refactor TopologyAffinityError as a proper error. Co-authored-by: Kevin Klues <kklues@nvidia.com> Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
Francesco Romani	c5cb263dcf	smtalign: propagate policy options to cpumanager The CPUManagerPolicyOptions received from the kubelet config/command line args is propogated to the Container Manager. We defer the consumption of the options to a later patch(set). Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:35 +02:00
Li Bo	c3d9b10ca8	feature: support Memory QoS for cgroups v2	2021-07-08 09:26:46 +08:00
Akihiro Suda	dbe0155139	kubelet/cm: ignore sysctl error when running in userns Errors during setting the following sysctl values are ignored: - vm.overcommit_memory - vm.panic_on_oom - kernel.panic - kernel.panic_on_oops - kernel.keys.root_maxkeys - kernel.keys.root_maxbytes Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2021-07-07 14:23:29 +09:00
Kubernetes Prow Robot	eae87bfe7e	Merge pull request #103483 from odinuge/revert-102508-runc-1.0 Revert "Update runc to 1.0.0"	2021-07-06 10:42:56 -07:00
Artyom Lukianov	bb6d5b1f95	memory manager: provide unittests for init containers re-use - provide tests for static policy allocation, when init containers requested memory bigger than the memory requested by app containers - provide tests for static policy allocation, when init containers requested memory smaller than the memory requested by app containers - provide tests to verify that init containers removed from the state file once the app container started Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Artyom Lukianov	960da7895c	memory manager: remove init containers once app container started Remove init containers from the state file once the app container started, it will release the memory allocated for the init container and can intense the density of containers on the NUMA node in cases when the memory allocated for init containers is bigger than the memory allocated for app containers. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Artyom Lukianov	b965502c49	memory manager: re-use the memory allocated for init containers The idea that during allocation phase we will: - during call to `Allocate` and `GetTopologyHints` we will take into account the init containers reusable memory, which means that we will re-use the memory and update container memory blocks accordingly. For example for the pod with two init containers that requested: 1Gi and 2Gi, and app container that requested 4Gi, we can re-use 2Gi of memory. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Odin Ugedal	61d88af9e4	Revert "Update runc to 1.0.0"	2021-07-05 14:03:04 +02:00
Kir Kolyshkin	ab5b77944e	kubelet/cm: don't set Devices Since runc 1.0.0 it is now sufficient to have SkipDevices: true. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-30 16:17:35 -07:00
pacoxu	f2eec0a816	ResourceConfigForPod: check initContainers as other QoS func Signed-off-by: pacoxu <paco.xu@daocloud.io>	2021-06-28 19:22:42 +08:00
Kubernetes Prow Robot	07358f1663	Merge pull request #103146 from tech-geek29/fix-95380 Change log level to Debug	2021-06-25 07:44:45 -07:00
Rishabh Jain	8f08db9164	Change log level to Debug	2021-06-24 14:23:06 +05:30
Kenta Tada	89a4d4b071	kubelet: modify the function of getCgroupSubsystemsV2 to use libcontainer API	2021-06-24 16:58:05 +09:00
Kubernetes Prow Robot	985ac8ae50	Merge pull request #101030 from cynepco3hahue/pod_resources_memory_interface Extend pod resource API response to return the information from memory manager	2021-06-22 06:35:58 -07:00
Artyom Lukianov	03830db82d	Implement all necessary methods to provide memory manager data under pod resources metrics Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-06-22 13:06:32 +03:00
Kubernetes Prow Robot	3bd29bc53d	Merge pull request #102829 from snowplayfire/update-devicemanager Add resource capacity to ListAndWatch grpc logging	2021-06-21 16:28:09 -07:00
jingxueli	45d18acbcc	add info for possible failed listAndWatch grpc call	2021-06-17 16:25:20 +08:00
Kubernetes Prow Robot	85f0931ab9	Merge pull request #102772 from saintube/patch-1 cleanup: fix kubelet cpuset typo	2021-06-14 19:00:13 -07:00
Francesco Romani	369416b763	cm: handle nil cpumanager avoiding segfault If the cpumanager feature gate is disabled, the corresponsing field of the containerManager will be nil. A couple functions don't check for this occurrence and happily deference the pointer unconditionally, leading to possible segfaults. The relevant functions were introduced to support the podresources API, so to trigger this segfault all the following are needed: - cpumanager feature gate has to be disabled explicitely - any podresources API must be called Worth pointing out that when the new functions were introduced (around kubernetes 1.20) the default feature gate for cpumanager was already set to true, hence this bug is expected to be triggered rarely. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-06-10 16:22:43 +02:00
Frame	9255f2ccf3	Fix kubelet cpuset typo	2021-06-10 18:17:04 +08:00
Kubernetes Prow Robot	1795a98eeb	Merge pull request #102221 from kikimo/add-hint-to-fake-topology-manager Add hint to fake topology manager.	2021-06-02 03:40:05 -07:00
kikimo	86d68effc2	clean code	2021-06-02 09:07:53 +08:00
Kubernetes Prow Robot	7c7a0865cd	Merge pull request #102218 from kolyshkin/cgroup-cleanups pkg/kubelet/cm: cgroup-related cleanups	2021-06-01 13:45:51 -07:00
kikimo	9d2135f703	reuse fake topology manager	2021-06-02 01:35:00 +08:00
kikimo	8b3162d67b	clean code	2021-06-02 01:17:04 +08:00
sanwishe	9e257ec194	Optimization logging format for pkg/kubelet Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>	2021-05-25 08:52:08 +08:00
Kubernetes Prow Robot	cf59c68e15	Merge pull request #102088 from wzshiming/fix/pod-devices-has-pod-lock Add the missing RLock	2021-05-24 15:16:20 -07:00
Kir Kolyshkin	f1aee7e049	kubelet/cm: GetResourceStats -> MemoryUsage Commit `cc50aa9dfb` introduced GetResourceStats, a method which collected all the statistics from various cgroup controllers, only to discard all of the info collected except a single value (memory usage). While one may argue that this method can potentially be used from other places, this did not happen since it was added 4+ years ago. Let's streamline this code and only collect what we need, i.e. memory usage. Rename the method accordingly. While at it, fix pkg/kubelet/cm/cgroup_manager_unsupported.go to not instantiate a new error every time a method is called. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-05-23 20:43:52 -07:00
kikimo	20c02357ca	Add hint to fake topology manager.	2021-05-22 15:29:08 +08:00

... 7 8 9 10 11 ...

1418 Commits