kubernetes

Author	SHA1	Message	Date
Matthias Bertschy	9500ee9d9c	container_manager: use oomScoreAdj instead of default when set	2022-02-12 15:23:13 +01:00
Kubernetes Prow Robot	1659924a97	Merge pull request #108070 from jsafrane/remove-selinux Remove util/selinux package	2022-02-11 18:19:47 -08:00
Kubernetes Prow Robot	8580bbf7d7	Merge pull request #107594 from hakman/remove_container-runtime_logic Clean up logic for deprecated flag --container-runtime in kubelet	2022-02-11 12:57:47 -08:00
Jan Safranek	77aa06d0c8	Remove util/selinux package The package says: > the libcontainer SELinux package is only built for Linux, so it is > necessary to have a NOP wrapper which is built for non-Linux platforms This is not true, Kubernetes now imports github.com/opencontainers/selinux/go-selinux and it has proper multiplatform support (i.e. NOOP on non-Linux platforms). Removing the whole package and calling go-selinux directly.	2022-02-11 15:20:35 +01:00
Kevin Klues	155562dd2e	Fix bug in TopologyManager with merging hints when NUM_NUMA > 2 Before this fix, hint permutations such as: permutation: [{11 true} {0101 true}] Could result in merged hints of: mergedHint: {01 true} This was possible because both hints in the permutation container a "preferred" allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are required to satisfy the allocation). With this in place, the simplified logic we had simply kept the merged hint as preferred as well. However, what we really want is to ensure that the merged hint is only preferred if true alignment of all resources is possible (i.e. if all hints in the permutation are preferred AND their affinities are exactly equal). The only exception to this is if no topology information is provided by a given hint provider. In this case, we assume alignment doesn't matter and only consider the resources that actually have hints provided for them. This changes the semantics of permutations of the form: permutation: [{111 true} {011 true}] To now result in the merged hint of: mergedHint: {011 false} Instead of: mergedHint: {011 true} This is arguably how it should always have been though (because a hint should not be preferred if true alignment isn't possible), and two tests have had to change to accomodate these new semantics. This commit changes the merge function to implement the updated logic, adds a test to verify it is functioning correctly, and updates the two tests mentioned above to adjust to the new semantics. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-02-10 22:07:51 +00:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
Danielle Lancashire	c198062da4	cm: Remove legacy docker references Dockershim and built-in Docker support are gone. Cleans up dead code references to them.	2022-02-08 16:25:04 +01:00
Kubernetes Prow Robot	3bd422dc76	Merge pull request #107293 from dims/jan-1-owners-cleanup Cleanup OWNERS files - Jan 2021 Week 1	2022-01-13 10:30:30 -08:00
Kubernetes Prow Robot	cadbe8dfb5	Merge pull request #107250 from cndoit18/use-errors cleanup(kubelet): use errors.Is(err, os.ErrProcessDone)	2022-01-11 10:49:01 -08:00
Davanum Srinivas	9682b7248f	OWNERS cleanup - Jan 2021 Week 1 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-01-10 08:14:29 -05:00
Kubernetes Prow Robot	03ee86c09c	Merge pull request #104837 from eggiter/fix-release-reused-cpus fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers	2022-01-06 11:46:38 -08:00
cndoit18	601d02b90f	refactor(kubelet): use errors.Is(err, os.ErrProcessDone) use errors.Is(err, os.ErrProcessDone) here and remove "process already finished" string comparison. Signed-off-by: cndoit18 <cndoit18@outlook.com>	2021-12-29 18:10:06 +08:00
yxxhero	a90b149be0	add more message for no PodSandbox container Signed-off-by: yxxhero <aiopsclub@163.com>	2021-12-18 09:52:03 +08:00
Davanum Srinivas	497e9c1971	Cleanup OWNERS files (No Activity in the last year) Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-15 10:34:02 -05:00
Kubernetes Prow Robot	1d66302c42	Merge pull request #106458 from dims/lint-yaml-in-owners-files Lint/Beautify yaml in OWNERS files	2021-12-10 06:39:12 -08:00
Davanum Srinivas	9405e9b55e	Check in OWNERS modified by update-yamlfmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-09 21:31:26 -05:00
Kevin Klues	f8511877e2	Add regression test for CPUManager distribute NUMA algorithm We witnessed this exact allocation attempt in a live cluster and witnessed the algorithm fail with an accounting error. This test was added to verify that this case is now handled by the updates to the algorithm and that we don't regress from it in the future. "test" description="ensure previous failure encountered on live machine has been fixed (1/1)" "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606 "bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6] Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:49:58 +00:00
Kevin Klues	e284c74d93	Add unit test for CPUManager distribute NUMA algorithm verifying fixes Before Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3] --- FAIL: TestTakeByTopologyNUMADistributed (0.01s) --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s) cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1] After Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3] SUCCESS Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:45:37 +00:00
Kevin Klues	031f11513d	Fix accounting bug in CPUManager distribute NUMA policy Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider all combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 19:18:11 +00:00
Kevin Klues	5317a2e2ac	Fix error handling in CPUManager distribute NUMA tests Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:31 +00:00
Kevin Klues	dc4430b663	Add a sum() helper to the CPUManager cpuassignment logic Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:29 +00:00
Kevin Klues	cfacc22459	Allow the map.Values() function in the CPUManager to take a set of keys Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:28 +00:00
Kevin Klues	a160d9a8cd	Fix CPUManager algo to calculate min NUMA nodes needed for distribution Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of available NUMA nodes and the number of available CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the total number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:26 +00:00
Kevin Klues	209cd20548	Fix unit tests following bug fix in CPUManager for map functions (2/2) Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:24 +00:00
Kevin Klues	67f719cb1d	Fix unit tests following bug fix in CPUManager for map functions (1/2) This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:23 +00:00
Kevin Klues	4008ea0b4c	Fix bug in CPUManager map.Keys() and map.Values() implementations Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:21 +00:00
Kevin Klues	446c58e0e7	Ensure we balance across all NUMA nodes in NUMA distribution algo Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:19 +00:00
Kevin Klues	c8559bc43e	Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:16 +00:00
Kevin Klues	b28c1392d7	Round the CPUManager mean and stddev calculations to the nearest 1000th Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:13 +00:00
Sascha Grunert	de37b9d293	Make CRI `v1` the default and allow a fallback to `v1alpha2` This patch makes the CRI `v1` API the new project-wide default version. To allow backwards compatibility, a fallback to `v1alpha2` has been added as well. This fallback can either used by automatically determined by the kubelet. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-11-17 11:05:05 -08:00
Antonio Ojea	d126b14838	migrate nolint coments to golangci-lint	2021-11-17 13:58:53 +01:00
Neha Lohia	fa1b6765d5	move pkg/util/node to component-helpers/node/util (#105347 ) Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>	2021-11-12 07:52:27 -08:00
Kubernetes Prow Robot	3ca3daac76	Merge pull request #103415 from tiloso/staticcheck-kubelet Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-11-11 15:15:13 -08:00
Kubernetes Prow Robot	359b722c19	Merge pull request #102882 from fromanirh/device-manager-checkpoints devicemanager: checkpoint: support pre-1.20 data	2021-11-02 16:56:57 -07:00
Kubernetes Prow Robot	08bf54678e	Merge pull request #101909 from nolancon/cpu-mgr-testing Additional cases for reconcileState testing	2021-10-30 00:01:17 -07:00
Francesco Romani	2f426fdba6	devicemanager: checkpoint: support pre-1.20 data The commit `a8b8995ef2` changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-26 09:54:11 +02:00
Kevin Klues	86f9c266bc	Add optimizations to reduce iterations in distributed NUMA algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-18 08:53:25 +00:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	d54445a84d	Generalize the NUMA distribution algorithm to take cpuGroupSize This parameter ensures that CPUs are always allocated in groups of size 'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e. hyperthreads) from the same core are handed out together. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	1436e33642	Add more extensive testing for NUMA distribution algorithm in CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	cf3afb8602	Add 2 distinguishing test cases between the 2 takeByTopology algorithms Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	eb78e2406b	Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager As part of this, pull out all of the existing "TakeByTopology" tests and have them be called by the original TestTakeByTopologyNUMAPacked() as well as the new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will add some tests that should differ between these two algorithms. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	876dd9b078	Added algorithm to CPUManager to distribute CPUs across NUMA nodes Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
Kevin Klues	0e7928edce	Add new CPUManager policy option for "distribute-cpus-across-numa" This commit only adds the option to the policy options framework. A subsequent commit will add the logic to utilize it. The KEP describing this new option can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
Francesco Romani	4bae656835	cpumanager: test NUMA node support for CPU assign (2) This batch of tests adds a fake topology on which each numa node has multiple sockets. We didn't find yet a real HW topology in the wild like this, but we need one to fully exercise the code. So, until we find a HW topology, we add a fake one flipping the NUMA/socket config of the existing xeon dual gold 6320. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	547996f3f6	cpumanager: test NUMA node support for CPU assign (1) This batch of tests adds a real topology on which each physical socket has multiple NUMA zones. Taken by a real dual xeon 6320 gold. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	f6ccc4426a	cpumanager: test: use proper subtests The exisiting unit tests where performing subtests without actually using the full features of the testing package (https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks) Update them with fairly minimal changes. The patch is deceptively large because we need to move the code inside a new block. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	15caa134b2	cpumanager: topology: use rich cmp package User the `cmp.Diff` package in the unit tests, moving away from `reflect.DeepEqual`. This gives us a clearer picture of the differences when the tests fail. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Kevin Klues	aff54a0914	Abstract out whether NUMA or Sockets come first in the memory hierarchy This allows us to get rid of the check for determining which one is higher all throughout the code. Now we just check once and instantiate an interface of the appropriate type that makes sure the ordering in the hierarchy is preserved through the appropriate calls. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 10:29:15 +00:00

1 2 3 4 5 ...

1075 Commits