kubernetes

Author	SHA1	Message	Date
Jordan Liggitt	886727a4c0	Revert "Add deviceManager in windows container manager" This reverts commit `056d73b1a1`.	2020-07-20 16:13:53 -04:00
Giuseppe Scrivano	ef935bd991	kubelet: clamp cpu shares to max allowed clamp the max cpu.shares to the maximum value allowed by the kernel. It is not an issue when using cgroupfs, as the kernel will anyway make sure the value is not out of range and automatically clamp it, systemd has an additional check that prevents the cgroup creation. Closes: https://github.com/kubernetes/kubernetes/issues/92855 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-20 17:18:03 +02:00
Kevin Klues	00df26a985	Fix a bug whereby reusable CPUs and devices were not being honored Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled. This would happen because hint generation for the TopologyManager was not considering the reusable devices when it made its hint calculation. As such, it would sometimes: 1) Generate a hint for a differnent NUMA node, causing the CPUs and devices to be allocated from that node instead of the one where the reusable devices live; or 2) End up thinking there were not enough CPUs or devices to allocate and throw a TopologyAffinity admission error This patch fixes this by ensuring that reusable CPUs and devices are considered as part of TopologyHint generation. This frunctionality is difficult to unit test since it spans multiple components, but an e2e test will be added in a subsequent patch to test this functionality.	2020-07-20 11:41:13 +00:00
Kevin Klues	74fe9364c3	Simplify logic in devicemanager TopologyHint generation	2020-07-20 11:41:13 +00:00
Kevin Klues	9f5f401d60	Add AnySet() to topologymanager bitmask API	2020-07-20 11:41:13 +00:00
Kubernetes Prow Robot	242f3d9dce	Merge pull request #80917 from aarnaud/windows-devicemanager Port deviceManager to windows container manager to enable GPU access	2020-07-17 21:04:50 -07:00
Giuseppe Scrivano	79be8be10e	kubelet, cgroupv2: make hugetlb optional make the hugetlb controller optional when cgroup v2 is used. Closes: https://github.com/kubernetes/kubernetes/issues/92933 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-13 09:40:55 +02:00
Kubernetes Prow Robot	63926cf8e7	Merge pull request #92862 from giuseppe/cgroup-fix-leaks vendor: update github.com/opencontainers/runc	2020-07-11 20:57:11 -07:00
Giuseppe Scrivano	0d2a493a8f	kubelet: skip setting the devices cgroup use the new libcontainer feature of skipping setting the devices cgroup. This is necessary on cgroup v2 to avoid leaking a eBPF program every time the cgroup is re-configured. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-09 09:37:46 +02:00
Anthony ARNAUD	056d73b1a1	Add deviceManager in windows container manager	2020-07-08 18:22:16 +02:00
Kevin Klues	26cb650655	Remove unnecessary union after call to GetPreferredAllocation() There is no need to try and allocate already-allocated devices again.	2020-07-07 06:35:57 +00:00
Kevin Klues	67ecc11c44	Harden callGetPreferredAllocationIfAvailable() return value Previously, we didn't check the contents of the result after calling out to the plugin endpoint. This could have resulted in errors if the plugin returned either 'nil' or an empty result. This patch fixes this.	2020-07-07 06:35:57 +00:00
Kevin Klues	d87365494a	Fix bug in call to callGetPreferredAllocationIfAvailable() Previously, we were passing the variable 'devices' to this function, when we should have been passing 'allocated'. This bug crept in due to a variable name change that didn't propogate its way through the entire function. The tests added in the previous commit would have caught this.	2020-07-07 06:35:57 +00:00
Kevin Klues	d551ab1e78	Add tests to check paramaters passed to GetPreferredAllocation() These tests uncovered some small bugs that will be fixed in a subsequent set of commits.	2020-07-07 06:35:57 +00:00
Kevin Klues	5bd0db0b1f	Add new test cases for GetPreferredAllocation() in allocation path	2020-07-03 13:01:32 +00:00
Kevin Klues	83f18d9975	Remove unnecessary field from TestTopologyAlignedAllocation() test cases	2020-07-03 13:01:32 +00:00
Kevin Klues	bb08fd1135	Add a simple endpoint test for GetPreferredAllocation() More extensive tests that exercise the allocation logic are to follow.	2020-07-03 13:01:32 +00:00
Kevin Klues	cbd405d85c	Update existing tests in support of GetPreferredallocation()	2020-07-03 13:01:32 +00:00
Kevin Klues	a780ccff5b	Updates logic in devicesToAllocate() to call GetPreferredAllocation()	2020-07-02 22:07:27 +00:00
Kevin Klues	bb56a09133	Add callGetPreferredAllocationIfAvailable() function in devicemanager This function mimics what is already done for the conditional call to PreStartContainer() via the callPreStartContainerIfNeeded() function.	2020-07-02 22:07:27 +00:00
Kevin Klues	abf87c99c6	Add GetPreferredAllocation() as a supported device plugin endpoint	2020-07-02 15:15:50 +00:00
Kevin Klues	32c047a52e	Update device plugin stub with new GetPreferredAllocation() call	2020-07-02 15:15:48 +00:00
Kevin Klues	c45f1317eb	Fix some whitespacing and comments in devicemanager	2020-07-02 15:15:44 +00:00
Giuseppe Scrivano	e94aebf4cb	pkg/kubelet: adapt to new libcontainer API Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-06-24 18:39:51 +02:00
Li Zhijian	02eaa4f354	cleanup tempfiles in unit test Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>	2020-06-23 11:47:18 +08:00
Kubernetes Prow Robot	86ad0df820	Merge pull request #92203 from sjenning/add-sjenning-node-approver Add sjenning as kubelet approver	2020-06-19 21:52:02 -07:00
Seth Jennings	45d2b98aa8	add sjenning as kubelet approver	2020-06-19 13:00:55 -05:00
kadisi	a75323c76b	fix unexpected append mutations about pkg/kubelet package Signed-off-by: kadisi <iamkadisi@163.com> Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>	2020-06-03 13:36:57 +08:00
Kubernetes Prow Robot	0e37bcce2c	Merge pull request #88385 from tallclair/node-reviews Remove tallclair from some OWNERS files	2020-05-24 20:23:11 -07:00
Kubernetes Prow Robot	b170451caa	Merge pull request #90183 from dims/update-kubernetes-to-klog-v2 Update kubernetes to klog v2	2020-05-16 18:59:51 -07:00
Kubernetes Prow Robot	f011430e85	Merge pull request #84599 from mrobson/log-destroy Errors from cgroup destroy are swallowed. Log error at warning level.	2020-05-16 18:59:36 -07:00
Davanum Srinivas	07d88617e5	Run hack/update-vendor.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-05-16 07:54:33 -04:00
Davanum Srinivas	442a69c3bd	switch over k/k to use klog v2 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-05-16 07:54:27 -04:00
caiweidong	5ed8fb690c	Use klog to replace log to keep them in consistence	2020-05-12 13:49:10 +08:00
Tim Allclair	029a144ae9	Remove tallclair from some OWNERS files	2020-05-11 11:44:38 -07:00
Kubernetes Prow Robot	7fdc1275d9	Merge pull request #90377 from cbf123/container_cpuset_fixup_2 Fix exclusive CPU allocations being deleted at container restart	2020-04-27 13:40:04 -07:00
Chris Friesen	ab5870d808	Fix exclusive CPU allocations being deleted at container restart The expectation is that exclusive CPU allocations happen at pod creation time. When a container restarts, it should not have its exclusive CPU allocations removed, and it should not need to re-allocate CPUs. There are a few places in the current code that look for containers that have exited and call CpuManager.RemoveContainer() to clean up the container. This will end up deleting any exclusive CPU allocations for that container, and if the container restarts within the same pod it will end up using the default cpuset rather than what should be exclusive CPUs. Removing those calls and adding resource cleanup at allocation time should get rid of the problem. Signed-off-by: Chris Friesen <chris.friesen@windriver.com>	2020-04-27 11:36:54 -06:00
Kevin Klues	751b9f3e13	Update strategy used to reuse CPUs from init containers in CPUManager With the old strategy, it was possible for an init container to end up running without some of its CPUs being exclusive if it requested more guaranteed CPUs than the sum of all guaranteed CPUs requested by app containers. Unfortunately, this case was not caught by our unit tests because they didn't validate the state of the defaultCPUSet to ensure there was no overlap with CPUs assigned to containers. This patch updates the strategy to reuse the CPUs assigned to init containers across into app containers, while avoiding this edge case. It also updates the unit tests to now catch this type of error in the future.	2020-04-23 20:27:43 +00:00
Kubernetes Prow Robot	d92fdebd85	Merge pull request #89897 from giuseppe/test-e2e-node kubelet: fix e2e-node cgroups test on cgroup v2	2020-04-20 15:54:12 -07:00
Kubernetes Prow Robot	d0183703cb	Merge pull request #90059 from ahg-g/ahg-nodeinfo2 Cleanup obsolete NodeInfo methods	2020-04-14 17:32:04 -07:00
Abdullah Gharaibeh	d6522e0e74	rename framework pkg with schedulerframework for all instances under pkg/kubelet	2020-04-14 14:24:07 -04:00
Kubernetes Prow Robot	105c0c6951	Merge pull request #88970 from mysunshine92/correct-NodeAllocatableRoot fix function NodeAllocatableRoot	2020-04-14 11:04:13 -07:00
Abdullah Gharaibeh	bed9b2f23b	Cleanup obsolete NodeInfo methods	2020-04-12 18:13:46 -04:00
Giuseppe Scrivano	26d94ad628	kubelet: do not configure the device cgroup Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:06 +02:00
Giuseppe Scrivano	a9772b2290	kubelet: adapt cgroup_manager to cgroup v2 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:04 +02:00
Giuseppe Scrivano	6d16fee229	kubelet: cpu hard capping is supported on cgroup v2 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:03 +02:00
Francesco Romani	623587ec8b	cpumanager: test: add missing helper add back the missing AssertStateEqual helper; it is needed by some tests we still want to run. Signed-off-by: Francesco Romani <fromani@redhat.com>	2020-04-07 16:59:07 +02:00
Francesco Romani	be0fe3df9b	cpumanager: drop old custom file backend The cpumanager file-based state backend was obsoleted since few releases, aving the cpumanager moved to the checkpointmanager common infrastructure. The old test checking compatibility to/from the old format is also no longer needed, because the checkpoint format is stable (see https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager). Signed-off-by: Francesco Romani <fromani@redhat.com>	2020-04-07 13:24:48 +02:00
Kubernetes Prow Robot	b030be376b	Merge pull request #89581 from Wenfeng-GAO/simplify simplify code in topologymanager	2020-04-02 23:07:46 -07:00
Wenfeng-GAO	1aebbee7da	simplify code in topologymanager	2020-03-28 00:04:51 +08:00
Giuseppe Scrivano	c4429d8bd4	kubelet: add tests for cgroup v2 conversions follow-up for https://github.com/kubernetes/kubernetes/pull/85218 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-27 13:50:57 +01:00
Kubernetes Prow Robot	34c8b26c9f	Merge pull request #85218 from giuseppe/cgroupv2 kubelet: add initial support for cgroupv2	2020-03-26 14:10:23 -07:00
Kubernetes Prow Robot	4488fd4749	Merge pull request #89053 from bg-chun/move_package migration of re-usable package from pkg/kubelet/cm/cpumanager to pkg/kubelet/cm	2020-03-26 11:14:09 -07:00
yameiwang	6783f991c3	fix function NodeAllocatableRoot	2020-03-26 18:48:05 +08:00
Byonggon Chun	a3047672d0	move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing containerMap is used in CPU Manager to store all containers information in the node. containerMap provides a mapping from (pod, container) -> containerID for all containers a pod It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node. Signed-off-by: Byonggon Chun <bg.chun@samsung.com>	2020-03-14 02:38:51 +09:00
Giuseppe Scrivano	bb5ed1b797	kubelet: add initial support for cgroupv2 do a conversion from the cgroups v1 limits to cgroups v2. e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the equivalent on cgroups v2 is cpu.weight that uses a range [1-10000]. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-12 08:50:19 +01:00
Adelina Tuvenie	a9f834d17d	Implement noopWindowsResourceAllocator On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func and registered by the Kubelet is nil. We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order to be consistent with the original implementation.	2020-03-10 21:32:23 +01:00
Kubernetes Prow Robot	cd0057c16a	Merge pull request #88876 from nolancon/none-policy-fix Topology Manager none policy bug fix	2020-03-05 21:40:33 -08:00
Kubernetes Prow Robot	ce01a9bad0	Merge pull request #88857 from nolancon/test-fix Check for nil cpuManager in container manager	2020-03-05 20:05:14 -08:00
Kubernetes Prow Robot	48541a0b16	Merge pull request #87650 from nolancon/beta-feature-gate Update TopologyManager Feature Gate	2020-03-05 20:03:04 -08:00
nolancon	0551d408ac	Bug fix for TM none policy	2020-03-05 14:25:48 +00:00
nolancon	4baa1d967d	Check for nil cpuManager	2020-03-05 07:54:33 +00:00
Kubernetes Prow Robot	ac32644d6e	Merge pull request #87759 from klueska/upstream-move-cpu-allocation-to-pod-admit Guarantee aligned resources across containers	2020-03-04 20:12:37 -08:00
nolancon	e8538d9b76	Add mutex to Topology Manager Add/RemoveContainer This was exposed as a potential bug during e2e test debugging of this PR.	2020-03-02 04:07:21 +00:00
Kevin Klues	2327934a86	Rename GetTopologyPodAmitHandler() as GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its new function. Also remove the Topology Manager feature gate check at higher level kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().	2020-02-27 07:52:43 +00:00
nolancon	a9c6129577	Device Manager - Update unit tests - Pass container to Allocate(). - Loop through containers to call Allocate() on container by container basis.	2020-02-27 07:24:34 +00:00
nolancon	cb9fdc49db	Device Manager - Refactor allocatePodResources - allocatePodResources logic altered to allow for container by container device allocation. - New type PodReusableDevices - New field in devicemanager devicesToReuse	2020-02-27 07:24:34 +00:00
nolancon	0a9bd0334d	CPU Manager - Updates to unit tests: - Where previously we called manager.AddContainer(), we now call both manager.Allocate() and manager.AddContainer(). - Some test cases now have two expected errors. One each from Allocate() and AddContainer(). Existing outcomes are unchanged.	2020-02-27 07:24:34 +00:00
nolancon	467f66580b	CPU Manager - Add check to policy.Allocate() for init conatiners If container allocated CPUs is an init container, release those CPUs back into the shared pool for re-allocation to next container.	2020-02-27 07:24:33 +00:00
nolancon	709989efa2	CPU Manager - Rename policy.AddContainer() to policy.Allocate()	2020-02-27 07:24:33 +00:00
Kevin Klues	0d68bffd03	Change GetTopologyPodAdmitHandler() to be more general GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler type instead of the TopologyManager directly. The handler it returns is generally responsible for attempting to allocate any resources that require a pod admission check. When the TopologyManager feature gate is on, this comes directly from the TopologyManager. When it is off, we simply attempt the allocations ourselves and fail the admission on an unexpected error. The higher level kubelet.go feature gate check will be removed in an upcoming PR.	2020-02-27 07:24:26 +00:00
Oleg Chunikhin	b651178849	fix incorrect configuration of kubepods.slice unit by kubelet (issue #88197 )	2020-02-17 13:22:45 -05:00
Kubernetes Prow Robot	1a0f923a65	Merge pull request #87712 from alena1108/jan30kubelet Ineffassign fixes for pkg/controller and kubelet	2020-02-14 14:29:27 -08:00
Kevin Klues	0b168f0243	Change devicemanager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	91f91858a5	Change CPUManager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	9e4ee5ecc3	Add Allocate() call to TopologyManager's HintProvider interface Having this interface allows us to perform a tight loop of: for each container { containerHints = {} for each provider { containerHints[provider] = provider.GatherHints(container) } containerHints.MergeAndPublish() for each provider { provider.Allocate(container) } } With this in place we can now be sure that the hints gathered in one iteration of the loop always consider the allocations made in the previous.	2020-02-10 03:27:47 +00:00
Kevin Klues	a3f099ea4d	Split devicemanager Allocate into two functions Instead of having a single call for Allocate(), we now split this into two functions Allocate() and UpdatePluginResources(). The semantics split across them: // Allocate configures and assigns devices to a pod. From the requested // device resources, Allocate will communicate with the owning device // plugin to allow setup procedures to take place, and for the device // plugin to provide runtime settings to use the device (environment // variables, mount points and device files). Allocate(pod v1.Pod) error // UpdatePluginResources updates node resources based on devices already // allocated to pods. The node object is provided for the device manager to // update the node capacity to reflect the currently available devices. UpdatePluginResources( node schedulernodeinfo.NodeInfo, attrs *lifecycle.PodAdmitAttributes) error As we move to a model in which the TopologyManager is able to ensure aligned allocations from the CPUManager, devicemanger, and any other TopologManager HintProviders in the same synchronous loop, we will need to be able to call Allocate() independently from an UpdatePluginResources(). This commit makes that possible.	2020-02-10 03:27:47 +00:00
Takeaki Matsumoto	785fac6826	Make updateAllocatedDevices() as a public method and call it in podresources api	2020-02-07 13:26:56 +09:00
Kevin Klues	d5addb4090	Cleanup logging and creation logic of TopologyManager in prep for beta	2020-02-03 17:13:29 +00:00
Kevin Klues	bc686ea27b	Update TopologyManager.GetTopologyHints() to take pointers Previously, this function was taking full Pod and Container objects unnecessarily. This commit updates this so that they will take pointers instead.	2020-02-03 17:13:28 +00:00
Kevin Klues	adaa58b6cb	Update TopologyManager.Policy.Merge() to return a simple bool Previously, the verious Merge() policies of the TopologyManager all eturned their own lifecycle.PodAdmitResult result. However, for consistency in any failed admits, this is better handled in the top-level Topology manager, with each policy only returning a boolean about whether or not they would like to admit the pod or not. This commit changes the semantics to match this logic.	2020-02-03 17:13:28 +00:00
Kevin Klues	95a3ac447f	Fix bug in TopologManager RemoveContainer() Previously, we unconditionally removed all topology hints from a pod whenever just one container was being removed. This commit makes it so we only remove the hints for the single container being removed, and then conditionally remove the pod from the podTopologyHints[podUID] when no containers left in it.	2020-02-03 17:13:14 +00:00
Alena Prokharchyk	6c3093f970	Ineffassign fixes for pkg/controller and kubelet	2020-01-30 14:35:10 -08:00
sewon.oh	463442aa29	Update container hugepage limit when creating the container Unit test for updating container hugepage limit Add warning message about ignoring case. Update error handling about hugepage size requirements Signed-off-by: sewon.oh <sewon.oh@samsung.com>	2020-01-28 09:35:02 +09:00
Kubernetes Prow Robot	98f63eee1b	Merge pull request #87460 from nolancon/policies_refactor Refactor Topology Manager policies to reduce code duplication	2020-01-24 21:11:15 -08:00
nolancon	4d76b1c8de	Add mergeFilteredHints: - Move remaining logic from mergeProvidersHints to generic top level mergeFilteredHints function. - Add numaNodes as parameter in order to make generic. - Move single NUMA node specific check to single-numa-node Merge function.	2020-01-22 09:07:41 +00:00
nolancon	fc300e0e7d	Move filterSingleNumaHints call to top level Merge	2020-01-22 08:39:22 +00:00
nolancon	45660fd3a2	Add filterProvidersHints function: - Move initial 'filtering' functionality to generic function filterProvidersHints level policy.go. - Call new function from top level Merge function. - Rename some variables/parameters to reflect changes.	2020-01-22 08:35:28 +00:00
nolancon	df9b2595f3	Update filterHints to filterSingleNumaHints: - Change function name - Remove policy parameter (unnecessary) - Update unit test to reflect change	2020-01-22 07:15:00 +00:00
Kubernetes Prow Robot	9822016bf8	Merge pull request #87397 from klueska/upstream-cpu-manager-set-initial-containers Initialize CPUManager containerMap to set of initial containers	2020-01-20 17:39:50 -08:00
Kubernetes Prow Robot	e6b5194ec1	Merge pull request #84300 from klueska/upstream-cpu-manager-reconcile-on-container-state Update logic in `CPUManager` `reconcileState()`	2020-01-20 12:27:37 -08:00
Kevin Klues	bd9d8fa42f	Initialize CPUManager containerMap to set of initial containers A recent change made it so that the CPUManager receives a list of initial containers that exist on the system at startup. This list can be non-empty, for example, after a kubelet retart. This commit ensures that the CPUManagers containerMap structure is initialized with the containers from this list.	2020-01-20 20:42:29 +01:00
Kubernetes Prow Robot	37ee6425ef	Merge pull request #87255 from klueska/upstream-remove-redundant-active-pods-check Remove check for empty activePods list in CPUManager removeStaleState	2020-01-20 09:05:50 -08:00
Kubernetes Prow Robot	23fa359d6c	Merge pull request #84705 from whypro/cpumanager-panic-master Return error instead of panic when cpu manager fails on startup.	2020-01-20 07:25:37 -08:00
Kevin Klues	7be9b0fe55	Update comments and error messages in the CPUManager	2020-01-20 15:31:01 +01:00
Kevin Klues	f2acbf6607	Base CPUManager state reconciliation on container state, not pod state	2020-01-20 13:57:30 +00:00
Kevin Klues	f6cf9b8ce9	Move CPUManager Pod Status logic before container loop	2020-01-20 13:57:30 +00:00
Kubernetes Prow Robot	50f9ea7999	Merge pull request #85798 from nolancon/merge-policy-rebase Updated - topologymanager: Add Merge method to Policy	2020-01-17 05:14:56 -08:00
Kubernetes Prow Robot	9701baea0f	Merge pull request #87283 from klueska/update-printing-for-tm-bitmask Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 12:04:32 -08:00
Kevin Klues	708278098a	Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 17:28:52 +01:00

1 2 3 4 5 ...

840 Commits