kubernetes

Author	SHA1	Message	Date
Jordan Liggitt	0b90b6ec5e	Add field paths to expected unknown/duplicate errors	2021-12-13 09:38:13 -05:00
Kubernetes Prow Robot	ba200841fd	Merge pull request #106366 from cyclinder/evictions_number_stable adding evictions_total metric and marking evictions_number deprecated	2021-12-12 23:19:59 -08:00
cyclinder	b88b51c6e5	adding evictions_total metric and marking evictions_number deprecated Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-13 10:36:02 +08:00
Kubernetes Prow Robot	0cae5f5006	Merge pull request #106744 from BinacsLee/binacs/fix-race-condition-in-scheduler-eventhandler scheduler: fix race condition during cache refresh	2021-12-11 00:31:59 -08:00
Kubernetes Prow Robot	030c3fbd58	Merge pull request #106936 from sbangari/windowsserviceflappingfix Skip creating HNS loadbalancer with empty endpoints	2021-12-10 22:41:57 -08:00
Kubernetes Prow Robot	1d66302c42	Merge pull request #106458 from dims/lint-yaml-in-owners-files Lint/Beautify yaml in OWNERS files	2021-12-10 06:39:12 -08:00
BinacsLee	1027b8de40	scheduler: fix race condition during cache refresh	2021-12-10 20:46:12 +08:00
Kubernetes Prow Robot	1b0d83f1d6	Merge pull request #106599 from klueska/fix-numa-bug Fix Bugs in CPUManager distribute NUMA policy option	2021-12-10 04:41:12 -08:00
Sravanth Bangari	26be8d6890	Skip creating HNS loadbalancer with empty endpoints	2021-12-09 20:03:21 -08:00
Kubernetes Prow Robot	15e5f2a19a	Merge pull request #106291 from sbs2001/fix_invalid_comment Remove invalid comment in legacyregistry	2021-12-09 19:03:10 -08:00
Davanum Srinivas	9405e9b55e	Check in OWNERS modified by update-yamlfmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-09 21:31:26 -05:00
David Porter	95264a418d	kubelet: set failed phase during graceful shutdown Revert to previous behavior in 1.21/1.20 of setting pod phase to failed during graceful node shutdown. Setting pods to failed phase will ensure that external controllers that manage pods like deployments will create new pods to replace those that are shutdown. Many customers have taken a dependency on this behavior and it was breaking change in 1.22, so this change reverts back to the previous behavior. Signed-off-by: David Porter <david@porter.me>	2021-12-09 13:17:40 -08:00
Kubernetes Prow Robot	cdf3ad823a	Merge pull request #97252 from dims/drop-dockershim Completely remove in-tree dockershim from kubelet	2021-12-08 12:51:46 -08:00
Kubernetes Prow Robot	2daa3415ec	Merge pull request #106838 from mengjiao-liu/remove-NamespaceDefaultLabelName-feature-gate Remove feature gate NamespaceDefaultLabelName	2021-12-08 08:53:46 -08:00
Kubernetes Prow Robot	f356ae4ad9	Merge pull request #101719 from SergeyKanzhelev/removeReallyCrashForTesting Remove ReallyCrashForTesting and cleaned up some references to Handle…	2021-12-07 23:39:45 -08:00
Kubernetes Prow Robot	b9977a7b17	Merge pull request #106851 from BinacsLee/binacs/cleanup-scheduler-profile cleanup: return frameworkruntime.NewFramework directly	2021-12-07 19:28:52 -08:00
Kubernetes Prow Robot	d7f8234b6d	Merge pull request #106747 from ahg-g/ahg-test Added an integration test for NodeResourcesFit scoring	2021-12-07 19:28:06 -08:00
Kubernetes Prow Robot	022d49dcbc	Merge pull request #106740 from wojtek-t/update_kubemark_clients Update kubemark to use EndpointSlices and proper user-agents	2021-12-07 19:27:59 -08:00
Kubernetes Prow Robot	d16a5e5feb	Merge pull request #106673 from qmloong/qmloong/master refactor: use utilerrors instead of join error msg	2021-12-07 18:27:22 -08:00
Kubernetes Prow Robot	68b53cf940	Merge pull request #106581 from knabben/win-kernel-kproxy-metrics Registering kube-proxy metrics on windows kernel mode	2021-12-07 18:26:09 -08:00
Kubernetes Prow Robot	75109026d0	Merge pull request #106447 from hyschumi/fix-noderesources cleanup duplicated method `makeNodeWithExtendedResource` in noderesources unit test	2021-12-07 17:27:10 -08:00
Kubernetes Prow Robot	b8c1b38261	Merge pull request #106406 from cyclinder/remove_DeleteChain_TODO kube-proxy remove todo: call iptables -S first when delete chain	2021-12-07 17:26:56 -08:00
Kubernetes Prow Robot	39b45fb040	Merge pull request #106381 from dims/update-dims-as-approver Update `dims` as approver for some top level dirs	2021-12-07 17:26:48 -08:00
Kubernetes Prow Robot	12901b95c9	Merge pull request #106344 from ikeeip/fix_import_formatting Fix golang imports in k8s.io/pkg/controller/volume/persistentvolume package	2021-12-07 17:26:40 -08:00
Kubernetes Prow Robot	a90f31f85a	Merge pull request #106179 from vivek-koppuru/fix-secret-format Fix string output format for secret validations	2021-12-07 17:26:10 -08:00
Kubernetes Prow Robot	b685b3982d	Merge pull request #105360 from shuheiktgw/refactor_kubelet_config_validation_tests Refactor kubelet config validation tests	2021-12-07 17:25:43 -08:00
Kubernetes Prow Robot	8174b0923c	Merge pull request #105127 from astraw99/fix-dup-kubeClient Fix duplicate CSI kube client	2021-12-07 17:25:30 -08:00
Davanum Srinivas	bc78dff42e	update files to drop dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
Davanum Srinivas	83265c9171	drop files deleted from pkg/kubelet/dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
BinacsLee	ab413849cc	cleanup: return frameworkruntime.NewFramework directly	2021-12-07 23:29:56 +08:00
Mengjiao Liu	f3c37c2c82	Remove feature gate NamespaceDefaultLabelName	2021-12-07 16:51:17 +08:00
Sascha Grunert	a063a2ba3e	Revert dockershim CRI v1 changes We should not touch the dockershim ahead of removal and therefore default to `v1alpha2` CRI instead of `v1`. Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-12-03 18:37:11 +01:00
Sergey Kanzhelev	1918ecad04	update the grpc field name for consistency	2021-12-01 18:16:08 +00:00
Abdullah Gharaibeh	33a04dc5f5	Added an integration test for NodeResourcesFit scoring	2021-11-30 12:13:30 -05:00
Wojciech Tyczyński	243f4faa6d	Update kubemark to use EndpointSlices and proper user-agents	2021-11-30 11:38:08 +01:00
Sergey Kanzhelev	a11453efbc	remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior	2021-11-29 20:00:10 +00:00
menglong.qi	ea31d7b813	refactor: use utilerrors instead of join error msg	2021-11-28 17:16:17 +08:00
Kevin Klues	f8511877e2	Add regression test for CPUManager distribute NUMA algorithm We witnessed this exact allocation attempt in a live cluster and witnessed the algorithm fail with an accounting error. This test was added to verify that this case is now handled by the updates to the algorithm and that we don't regress from it in the future. "test" description="ensure previous failure encountered on live machine has been fixed (1/1)" "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606 "bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6] Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:49:58 +00:00
Kevin Klues	e284c74d93	Add unit test for CPUManager distribute NUMA algorithm verifying fixes Before Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3] --- FAIL: TestTakeByTopologyNUMADistributed (0.01s) --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s) cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1] After Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3] SUCCESS Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:45:37 +00:00
Kevin Klues	031f11513d	Fix accounting bug in CPUManager distribute NUMA policy Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider all combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 19:18:11 +00:00
Kevin Klues	5317a2e2ac	Fix error handling in CPUManager distribute NUMA tests Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:31 +00:00
Kevin Klues	dc4430b663	Add a sum() helper to the CPUManager cpuassignment logic Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:29 +00:00
Kevin Klues	cfacc22459	Allow the map.Values() function in the CPUManager to take a set of keys Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:28 +00:00
Kevin Klues	a160d9a8cd	Fix CPUManager algo to calculate min NUMA nodes needed for distribution Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of available NUMA nodes and the number of available CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the total number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:26 +00:00
Kevin Klues	209cd20548	Fix unit tests following bug fix in CPUManager for map functions (2/2) Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:24 +00:00
Kevin Klues	67f719cb1d	Fix unit tests following bug fix in CPUManager for map functions (1/2) This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:23 +00:00
Kevin Klues	4008ea0b4c	Fix bug in CPUManager map.Keys() and map.Values() implementations Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:21 +00:00
Kevin Klues	446c58e0e7	Ensure we balance across all NUMA nodes in NUMA distribution algo Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:19 +00:00
Kevin Klues	c8559bc43e	Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:16 +00:00
Kevin Klues	b28c1392d7	Round the CPUManager mean and stddev calculations to the nearest 1000th Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:13 +00:00

1 2 3 4 5 ...

43590 Commits