kubernetes

Author	SHA1	Message	Date
Wenfeng-GAO	1aebbee7da	simplify code in topologymanager	2020-03-28 00:04:51 +08:00
Giuseppe Scrivano	c4429d8bd4	kubelet: add tests for cgroup v2 conversions follow-up for https://github.com/kubernetes/kubernetes/pull/85218 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-27 13:50:57 +01:00
Kubernetes Prow Robot	34c8b26c9f	Merge pull request #85218 from giuseppe/cgroupv2 kubelet: add initial support for cgroupv2	2020-03-26 14:10:23 -07:00
Kubernetes Prow Robot	4488fd4749	Merge pull request #89053 from bg-chun/move_package migration of re-usable package from pkg/kubelet/cm/cpumanager to pkg/kubelet/cm	2020-03-26 11:14:09 -07:00
yameiwang	6783f991c3	fix function NodeAllocatableRoot	2020-03-26 18:48:05 +08:00
Byonggon Chun	a3047672d0	move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing containerMap is used in CPU Manager to store all containers information in the node. containerMap provides a mapping from (pod, container) -> containerID for all containers a pod It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node. Signed-off-by: Byonggon Chun <bg.chun@samsung.com>	2020-03-14 02:38:51 +09:00
Giuseppe Scrivano	bb5ed1b797	kubelet: add initial support for cgroupv2 do a conversion from the cgroups v1 limits to cgroups v2. e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the equivalent on cgroups v2 is cpu.weight that uses a range [1-10000]. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-12 08:50:19 +01:00
Adelina Tuvenie	a9f834d17d	Implement noopWindowsResourceAllocator On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func and registered by the Kubelet is nil. We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order to be consistent with the original implementation.	2020-03-10 21:32:23 +01:00
Kubernetes Prow Robot	cd0057c16a	Merge pull request #88876 from nolancon/none-policy-fix Topology Manager none policy bug fix	2020-03-05 21:40:33 -08:00
Kubernetes Prow Robot	ce01a9bad0	Merge pull request #88857 from nolancon/test-fix Check for nil cpuManager in container manager	2020-03-05 20:05:14 -08:00
Kubernetes Prow Robot	48541a0b16	Merge pull request #87650 from nolancon/beta-feature-gate Update TopologyManager Feature Gate	2020-03-05 20:03:04 -08:00
nolancon	0551d408ac	Bug fix for TM none policy	2020-03-05 14:25:48 +00:00
nolancon	4baa1d967d	Check for nil cpuManager	2020-03-05 07:54:33 +00:00
Kubernetes Prow Robot	ac32644d6e	Merge pull request #87759 from klueska/upstream-move-cpu-allocation-to-pod-admit Guarantee aligned resources across containers	2020-03-04 20:12:37 -08:00
nolancon	e8538d9b76	Add mutex to Topology Manager Add/RemoveContainer This was exposed as a potential bug during e2e test debugging of this PR.	2020-03-02 04:07:21 +00:00
Kevin Klues	2327934a86	Rename GetTopologyPodAmitHandler() as GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its new function. Also remove the Topology Manager feature gate check at higher level kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().	2020-02-27 07:52:43 +00:00
nolancon	a9c6129577	Device Manager - Update unit tests - Pass container to Allocate(). - Loop through containers to call Allocate() on container by container basis.	2020-02-27 07:24:34 +00:00
nolancon	cb9fdc49db	Device Manager - Refactor allocatePodResources - allocatePodResources logic altered to allow for container by container device allocation. - New type PodReusableDevices - New field in devicemanager devicesToReuse	2020-02-27 07:24:34 +00:00
nolancon	0a9bd0334d	CPU Manager - Updates to unit tests: - Where previously we called manager.AddContainer(), we now call both manager.Allocate() and manager.AddContainer(). - Some test cases now have two expected errors. One each from Allocate() and AddContainer(). Existing outcomes are unchanged.	2020-02-27 07:24:34 +00:00
nolancon	467f66580b	CPU Manager - Add check to policy.Allocate() for init conatiners If container allocated CPUs is an init container, release those CPUs back into the shared pool for re-allocation to next container.	2020-02-27 07:24:33 +00:00
nolancon	709989efa2	CPU Manager - Rename policy.AddContainer() to policy.Allocate()	2020-02-27 07:24:33 +00:00
Kevin Klues	0d68bffd03	Change GetTopologyPodAdmitHandler() to be more general GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler type instead of the TopologyManager directly. The handler it returns is generally responsible for attempting to allocate any resources that require a pod admission check. When the TopologyManager feature gate is on, this comes directly from the TopologyManager. When it is off, we simply attempt the allocations ourselves and fail the admission on an unexpected error. The higher level kubelet.go feature gate check will be removed in an upcoming PR.	2020-02-27 07:24:26 +00:00
Oleg Chunikhin	b651178849	fix incorrect configuration of kubepods.slice unit by kubelet (issue #88197 )	2020-02-17 13:22:45 -05:00
Kubernetes Prow Robot	1a0f923a65	Merge pull request #87712 from alena1108/jan30kubelet Ineffassign fixes for pkg/controller and kubelet	2020-02-14 14:29:27 -08:00
Kevin Klues	0b168f0243	Change devicemanager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	91f91858a5	Change CPUManager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	9e4ee5ecc3	Add Allocate() call to TopologyManager's HintProvider interface Having this interface allows us to perform a tight loop of: for each container { containerHints = {} for each provider { containerHints[provider] = provider.GatherHints(container) } containerHints.MergeAndPublish() for each provider { provider.Allocate(container) } } With this in place we can now be sure that the hints gathered in one iteration of the loop always consider the allocations made in the previous.	2020-02-10 03:27:47 +00:00
Kevin Klues	a3f099ea4d	Split devicemanager Allocate into two functions Instead of having a single call for Allocate(), we now split this into two functions Allocate() and UpdatePluginResources(). The semantics split across them: // Allocate configures and assigns devices to a pod. From the requested // device resources, Allocate will communicate with the owning device // plugin to allow setup procedures to take place, and for the device // plugin to provide runtime settings to use the device (environment // variables, mount points and device files). Allocate(pod v1.Pod) error // UpdatePluginResources updates node resources based on devices already // allocated to pods. The node object is provided for the device manager to // update the node capacity to reflect the currently available devices. UpdatePluginResources( node schedulernodeinfo.NodeInfo, attrs *lifecycle.PodAdmitAttributes) error As we move to a model in which the TopologyManager is able to ensure aligned allocations from the CPUManager, devicemanger, and any other TopologManager HintProviders in the same synchronous loop, we will need to be able to call Allocate() independently from an UpdatePluginResources(). This commit makes that possible.	2020-02-10 03:27:47 +00:00
Takeaki Matsumoto	785fac6826	Make updateAllocatedDevices() as a public method and call it in podresources api	2020-02-07 13:26:56 +09:00
Kevin Klues	d5addb4090	Cleanup logging and creation logic of TopologyManager in prep for beta	2020-02-03 17:13:29 +00:00
Kevin Klues	bc686ea27b	Update TopologyManager.GetTopologyHints() to take pointers Previously, this function was taking full Pod and Container objects unnecessarily. This commit updates this so that they will take pointers instead.	2020-02-03 17:13:28 +00:00
Kevin Klues	adaa58b6cb	Update TopologyManager.Policy.Merge() to return a simple bool Previously, the verious Merge() policies of the TopologyManager all eturned their own lifecycle.PodAdmitResult result. However, for consistency in any failed admits, this is better handled in the top-level Topology manager, with each policy only returning a boolean about whether or not they would like to admit the pod or not. This commit changes the semantics to match this logic.	2020-02-03 17:13:28 +00:00
Kevin Klues	95a3ac447f	Fix bug in TopologManager RemoveContainer() Previously, we unconditionally removed all topology hints from a pod whenever just one container was being removed. This commit makes it so we only remove the hints for the single container being removed, and then conditionally remove the pod from the podTopologyHints[podUID] when no containers left in it.	2020-02-03 17:13:14 +00:00
Alena Prokharchyk	6c3093f970	Ineffassign fixes for pkg/controller and kubelet	2020-01-30 14:35:10 -08:00
sewon.oh	463442aa29	Update container hugepage limit when creating the container Unit test for updating container hugepage limit Add warning message about ignoring case. Update error handling about hugepage size requirements Signed-off-by: sewon.oh <sewon.oh@samsung.com>	2020-01-28 09:35:02 +09:00
Kubernetes Prow Robot	98f63eee1b	Merge pull request #87460 from nolancon/policies_refactor Refactor Topology Manager policies to reduce code duplication	2020-01-24 21:11:15 -08:00
nolancon	4d76b1c8de	Add mergeFilteredHints: - Move remaining logic from mergeProvidersHints to generic top level mergeFilteredHints function. - Add numaNodes as parameter in order to make generic. - Move single NUMA node specific check to single-numa-node Merge function.	2020-01-22 09:07:41 +00:00
nolancon	fc300e0e7d	Move filterSingleNumaHints call to top level Merge	2020-01-22 08:39:22 +00:00
nolancon	45660fd3a2	Add filterProvidersHints function: - Move initial 'filtering' functionality to generic function filterProvidersHints level policy.go. - Call new function from top level Merge function. - Rename some variables/parameters to reflect changes.	2020-01-22 08:35:28 +00:00
nolancon	df9b2595f3	Update filterHints to filterSingleNumaHints: - Change function name - Remove policy parameter (unnecessary) - Update unit test to reflect change	2020-01-22 07:15:00 +00:00
Kubernetes Prow Robot	9822016bf8	Merge pull request #87397 from klueska/upstream-cpu-manager-set-initial-containers Initialize CPUManager containerMap to set of initial containers	2020-01-20 17:39:50 -08:00
Kubernetes Prow Robot	e6b5194ec1	Merge pull request #84300 from klueska/upstream-cpu-manager-reconcile-on-container-state Update logic in `CPUManager` `reconcileState()`	2020-01-20 12:27:37 -08:00
Kevin Klues	bd9d8fa42f	Initialize CPUManager containerMap to set of initial containers A recent change made it so that the CPUManager receives a list of initial containers that exist on the system at startup. This list can be non-empty, for example, after a kubelet retart. This commit ensures that the CPUManagers containerMap structure is initialized with the containers from this list.	2020-01-20 20:42:29 +01:00
Kubernetes Prow Robot	37ee6425ef	Merge pull request #87255 from klueska/upstream-remove-redundant-active-pods-check Remove check for empty activePods list in CPUManager removeStaleState	2020-01-20 09:05:50 -08:00
Kubernetes Prow Robot	23fa359d6c	Merge pull request #84705 from whypro/cpumanager-panic-master Return error instead of panic when cpu manager fails on startup.	2020-01-20 07:25:37 -08:00
Kevin Klues	7be9b0fe55	Update comments and error messages in the CPUManager	2020-01-20 15:31:01 +01:00
Kevin Klues	f2acbf6607	Base CPUManager state reconciliation on container state, not pod state	2020-01-20 13:57:30 +00:00
Kevin Klues	f6cf9b8ce9	Move CPUManager Pod Status logic before container loop	2020-01-20 13:57:30 +00:00
Kubernetes Prow Robot	50f9ea7999	Merge pull request #85798 from nolancon/merge-policy-rebase Updated - topologymanager: Add Merge method to Policy	2020-01-17 05:14:56 -08:00
Kubernetes Prow Robot	9701baea0f	Merge pull request #87283 from klueska/update-printing-for-tm-bitmask Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 12:04:32 -08:00
Kevin Klues	708278098a	Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 17:28:52 +01:00
Kevin Klues	7069b1d6e8	Update TopologyManager single-numa-node logic to handle "don't cares" The logic has been updated to match the logic of the best-effort policy except in two places: 1) The hint filtering frunction has been updated to allow "don't care" hints encoded with a `nil` affinity mask, to pass through the filter in addition to hints that have just a single NUMA bit set. 2) After calculating the `bestHint` we transform "don't care" affinities encoded as having all NUMA bits set in their affinity masks into "don't care" affinities encoded as `nil`.	2020-01-16 08:50:35 +00:00
Kevin Klues	2905ffffa7	Rename TopologyManager test TestPolicyBestEffortMerge for consistency	2020-01-16 08:50:21 +00:00
Kevin Klues	94489c137c	Cleanup use of defaultAffinity in mergePermutation of TopologyManager	2020-01-16 08:50:12 +00:00
nolancon	5e23517ebf	Use reflect.DeepEqual check in policy_test.go	2020-01-16 08:13:07 +00:00
nolancon	92eb7cd601	Update "Single NUMA hint generation" expected affinity to nil	2020-01-16 08:13:07 +00:00
nolancon	8b3f6e61a2	Move test case "Two providers, 1 with 2 hints, 1 with single non-preferred hint matching" into specific policy tests	2020-01-16 08:13:07 +00:00
nolancon	681c42bfc2	Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1 not 2/2" into specific policy tests	2020-01-16 08:13:07 +00:00
nolancon	a38a2562b2	Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1 not 1/2" into specific policy test.	2020-01-16 08:13:07 +00:00
nolancon	f639da7637	Move test case "Two providers, 1 hint each, no common mask" into specific policy tests.	2020-01-16 08:13:07 +00:00
nolancon	401a2bb285	Move test case "Single TopologyHint with Preferred as false and NUMANodeAffinity as nil" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	6460ef6392	Move test case "Single TopologyHint with Preferred as true and NUMANodeAffinity as nil" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	baeff9ec5d	Move test case "HintProvider returns empty non-nil map[string][]TopologyHint from provider" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	599217d482	Move test case "HintProvider returns -nil map[string][]TopologyHint from provider" into specific policy tests	2020-01-16 08:13:06 +00:00
nolancon	57661ee946	Move test case 'HintProvider returns empty non-nil map[string][]TopologyHint' into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	51f1af0395	Move test case 'TopologyHint not set' into individual policy tests	2020-01-16 08:13:06 +00:00
nolancon	8466a5852a	Restore policy_test.go to upstream Following commits will contain incremental changes to this file to ease review process and ensure all tests are accounted for.	2020-01-16 08:13:06 +00:00
nolancon	59bb6c4d6f	Update checks in mergeProvidersHints: - Initialize best Hint to TopologyHint{} - Update checks. - Move generic unit test case into policy specific tests and updated expected outcome to reflect changes.	2020-01-16 08:13:06 +00:00
nolancon	6758f95117	Restore original policy none test cases: Mistakenly overwritten in earlier commit	2020-01-16 08:13:06 +00:00
nolancon	2d1a535a35	Make mergePermutation generic: - Remove policy parameters to make function generic - Move function into top level policy.go	2020-01-16 08:13:06 +00:00
nolancon	5487941485	Refactor filterHints: - Restructure function - Remove bug fix for catching {nil true} - To be fixed in later commit - Restore unit tests to original state for testing filterHints	2020-01-16 08:13:06 +00:00
nolancon	adfd11f38f	Make iterateAllProviderTopologyHints generic: - Remove policy parameters to make this function generic. - Move function out of individual policies and into policy.go	2020-01-16 08:13:06 +00:00
nolancon	e43f0a5293	Reinstate canAdmitPodResult in policy_none: This is to keep consistency with the other policies. This change may be made across all policies in a future PR, but removing it from the scope of this PR for now.	2020-01-16 08:13:05 +00:00
nolancon	4cc5b9e46c	Edit hints returned from policies and unit tests: - Best Effort Policy: Return hint with nil affinity as opposed to defaultAffinity when provider has no preference for NUMA affinty or no possible NUMA affinities. - Single NUMA Node Policy: Remove defaultHint from mergeProvidersHints. Instead return appropriate TopologyHint where required. - Update unit tests to reflect changes. Some test cases moved into individual policy test functions due to differing returned affinties per policy.	2020-01-16 08:13:05 +00:00
nolancon	e3d0c9397f	Updates to single-numa-node policy: - Remove getHintMatch method. - Replace with simplified versions of mergePermutation and iterateAllProviderTopologyHints methods - as used in best-effort. - Remove getHintMatch unit tests.	2020-01-16 08:13:05 +00:00
nolancon	b5ca4989e3	Update unit tests: - Update filterHints test to reflect changes in previous commit. - Some common test cases achieve differing expected results based on policy due to independent merge strategies. These cases are moved into individual policy based test functions.	2020-01-16 08:13:05 +00:00
nolancon	17d615bca2	Update filterHints: - Only append valid preferred-true hints to filtered - Return true if allResourceHints only consist of nil-affinity/preferred-true hints: {nil true}, update defaultHint preference accordingly.	2020-01-16 08:13:05 +00:00
Adrian Chiris	9f21f49493	Additional unit tests for Topology Manager methods	2020-01-16 08:13:05 +00:00
Adrian Chiris	f886d2a832	Update single-numa-node policy unit tests	2020-01-16 08:13:05 +00:00
Adrian Chiris	2825a7be1a	Add new functionality for single-numa-node policy: Explanation taken from original commit: - Change the current method of finding the best hint. Instead of going over all permutations, sort the hints and find the narrowest hint common to all resources. - Break out early when merging to a preferred hint is not possible	2020-01-16 08:13:05 +00:00
Adrian Chiris	5ce2ea2773	Return defaultAffinity from PolicyBestEffort: Now that PolicySingleNUMANode is not considered here, return defaultAffinity as was the original case before previous bug fix	2020-01-16 08:13:05 +00:00
Adrian Chiris	eda1521562	Make mergeProviderHints policy-specific: - Remove need to pass policy and numaNodes as arguments - Remove PolicySingleNUMANode special case check in policy_best_effort - Add mergeProviderHints base to policy_single_numa_node for upcoming commit	2020-01-16 08:13:05 +00:00
Adrian Chiris	dc36924c37	Update policy_none removing canAdmitPodResult Update unit tests for none_policy Add Name test for policy_restricted	2020-01-16 08:13:05 +00:00
Adrian Chiris	cf8b098dda	Refactor policy-best-effort - Modularize code with mergePermutation method	2020-01-16 08:13:05 +00:00
Sascha Grunert	278717bc57	Fix ineffectual assignment to CPUSets Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2020-01-16 08:57:42 +01:00
Kevin Klues	34b942a41d	Remove check for empty activePods list in CPUManager removeStaleState This check is redundant since we protect this call with a call to `m.sourcesReady.AllReady()` earlier on. Moreover, having this check in place means that we will leave some stale state around in cases where there are actually no active pods in the system and this loop hasn't cleaned them up yet. This can happen, for example, if a pod exits while the kubelet is down for some reason. We see this exact case being triggered in our e2e tests, where a test has been failing since October when this change was first introduced.	2020-01-15 20:09:24 +01:00
Kevin Klues	5802f3a910	Add proper activePods list in TestGetTopologyHints for CPUManager	2020-01-15 20:08:41 +01:00
danielqsj	1a9b121764	remove deprecated metrics of kubelet	2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot	fd0358fd21	Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum Lock checksum calculation for v1 CPUManager state to pre 1.18 logic	2020-01-08 02:57:40 -08:00
Kubernetes Prow Robot	d6412b856f	Merge pull request #84345 from danielqsj/withdialer replace grpc.WithDialer which is deprecated	2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot	9acf7d11fe	Merge pull request #86344 from klueska/upstream-cm-approver Add klueska as an approver in pkg/kubelet/cm/OWNERS	2020-01-06 09:54:16 -08:00
Kevin Klues	b373121a14	Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint This change is to prevent problems when we remove the V1->V2 migration code in the future. Without this, the checksums of all checkpoints would be hashed with the name CPUManagerCheckpointV2 embedded inside of them, which is undesirable. We want the checkpoints to be hashed with the name CPUManagerCheckpoint instead.	2019-12-28 19:29:13 +01:00
Kevin Klues	5faf8f4c52	Lock checksum calculation for v1 CPUManager state to pre 1.18 logic The updated CPUManager from PR #84462 implements logic to migrate the CPUManager checkpoint file from an old format to a new one. To do so, it defines the following types: ``` type CPUManagerCheckpoint = CPUManagerCheckpointV2 type CPUManagerCheckpointV1 struct { ... } type CPUManagerCheckpointV2 struct { ... } ``` This replaces the old definition of just: ``` type CPUManagerCheckpoint struct { ... } ``` Code was put in place to ensure proper migration from checkpoints in V1 format to checkpoints in V2 format. However (and this is a big however), all of the unit tests were performed on V1 checkpoints that were generated using the type name `CPUManagerCheckpointV1` and not the original type name of `CPUManagerCheckpoint`. As such, the checksum in the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate its checksum and not the original type name of `CPUManagerCheckpoint`. This causes problems in the real world since all pre-1.18 checkpoint files will have been generated with the original type name of `CPUManagerCheckpoint`. When verifying the checksum of the checkpoint file across an upgrade to 1.18, the checksum is calculated assuming a type name of `CPUManagerCheckpointV1` (which is incorrect) and the file is seen to be corrupt. This patch ensures that all V1 checksums are verified against a type name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`. It also locks the algorithm used to calculate the checksum in place, since it wil never change in the future (for pre-1.18 checkpoint files at least).	2019-12-28 14:17:55 +01:00
danielqsj	19fe9f8d94	replace grpc.WithDialer which is deprecated	2019-12-26 17:46:59 +08:00
whypro	f4bd4e2e96	Return error instead of panic when cpu manager starts failed.	2019-12-19 21:56:23 +08:00
Kevin Klues	9818b4522e	Add klueska as an approver in pkg/kubelet/cm/OWNERS	2019-12-17 10:40:23 +01:00
Kevin Klues	f553286156	Pass initial set of runtime containers to the CPUManager at startup These information associatedd with these containers is used to migrate the CPUManager state from it's old format to its new (i.e. keyed off of podUID and containerName instead of containerID).	2019-12-11 23:02:51 +01:00
Kevin Klues	6441e1ef43	Move CPUManager Checkpoint restoration to Start() instead of New()	2019-12-11 23:02:51 +01:00
Kevin Klues	69f8053850	Update top-level CPUManager to adhere to new state semantics For now, we just pass 'nil' as the set of 'initialContainers' for migrating from old state semantics to new ones. In a subsequent commit will we pull this information from higher layers so that we can pass it down at this stage properly.	2019-12-11 23:02:51 +01:00
Kevin Klues	185e790f71	Update CPUManager policies to adhere to new state semantics	2019-12-11 23:02:51 +01:00

1 2 3 4 5 ...

791 Commits