kubernetes

Author	SHA1	Message	Date
MartinForReal	d529b7e10b	add bootid support for windows node. Signed-off-by: MartinForReal <fanshangxiang@gmail.com>	2022-03-18 02:17:52 +00:00
Kubernetes Prow Robot	56062f7f4f	Merge pull request #108010 from endocrimes/dani/eviction-flake eviction: Deflake TestStart	2022-03-17 12:22:54 -07:00
Kubernetes Prow Robot	9e50a332d8	Merge pull request #108366 from smarterclayton/terminating_not_terminated Delay writing a terminal phase until the pod is terminated	2022-03-17 08:29:21 -07:00
Kubernetes Prow Robot	a504daa048	Merge pull request #108441 from pacoxu/pod-overload-ga mark PodOverhead to GA in v1.24; remove in v1.26	2022-03-17 06:33:22 -07:00
Kubernetes Prow Robot	ba1c42892f	Merge pull request #100424 from yangjunmyfm192085/run-test30 Add test cases of kubelet_pods_test.go.	2022-03-17 00:41:19 -07:00
Kubernetes Prow Robot	5cb6fab8f6	Merge pull request #105585 from fengzixu/improvement-volume-health add volume kubelet_volume_stats_health_abnormal to kubelet	2022-03-17 01:32:38 +00:00
fengzixu	7d675381f8	fix: fix panic bug when volumeHealthStatus is nil	2022-03-17 01:32:24 +00:00
Paco Xu	acd696266e	mark PodOverhead to GA in v1.24; remove in v1.26	2022-03-17 09:30:14 +08:00
David Porter	c70f1955c4	test: Add E2E for job completions with cpu reservation Create an E2E test that creates a job that spawns a pod that should succeed. The job reserves a fixed amount of CPU and has a large number of completions and parallelism. Use to repro github.com/kubernetes/kubernetes/issues/106884 Signed-off-by: David Porter <david@porter.me>	2022-03-16 13:15:03 -04:00
Clayton Coleman	69a3820214	kubelet: Delay writing a terminal phase until the pod is terminated Other components must know when the Kubelet has released critical resources for terminal pods. Do not set the phase in the apiserver to terminal until all containers are stopped and cannot restart. As a consequence of this change, the Kubelet must explicitly transition a terminal pod to the terminating state in the pod worker which is handled by returning a new isTerminal boolean from syncPod. Finally, if a pod with init containers hasn't been initialized yet, don't default container statuses or not yet attempted init containers to the unknown failure state.	2022-03-16 13:15:00 -04:00
Maciej Borsz	aa95513982	Revert "add volume kubelet_volume_stats_health_abnormal to kubelet"	2022-03-16 13:44:09 +01:00
Shiming Zhang	ced991cb00	Emit Metrics in the shutdown process	2022-03-16 10:14:55 +08:00
Kubernetes Prow Robot	096cd9df63	Merge pull request #108699 from xing-yang/update_owners Update sig-storage owners files	2022-03-15 14:28:00 -07:00
Kubernetes Prow Robot	1a5abe5d1f	Merge pull request #105585 from fengzixu/improvement-volume-health add volume kubelet_volume_stats_health_abnormal to kubelet	2022-03-15 05:58:11 -07:00
Kubernetes Prow Robot	7858fc93e5	Merge pull request #108004 from equinix-ms/kubelet-include-oommetrics kubelet: expose OOM metrics	2022-03-14 23:14:13 -07:00
xing-yang	aae1f2c476	Update sig-storage owners file	2022-03-14 18:57:52 +00:00
chymy	5374f6fad8	Fix comment typo Signed-off-by: chymy <chang.min1@zte.com.cn>	2022-03-14 16:53:29 +08:00
chymy	7ed6fa7b2e	Method call 'err.Error()' might lead to a nil pointer dereference for pkg/kubelet/cm/cpumanager/cpu_assignment_test.go Signed-off-by: chymy <chang.min1@zte.com.cn>	2022-03-14 16:35:11 +08:00
Shiming Zhang	a1fadab4b0	Atomic write status file	2022-03-11 17:50:33 +08:00
Shiming Zhang	4aed18935e	Add test for storage	2022-03-11 17:31:10 +08:00
Shiming Zhang	5eb3e88f6b	Support metrics for node shutdown	2022-03-11 17:31:10 +08:00
Kubernetes Prow Robot	c227403973	Merge pull request #108568 from stevekuznetsov/skuznets/verbose-error kubelet: cgroups: be verbose about validation	2022-03-10 11:59:07 -08:00
Steve Kuznetsov	8f2bc39f72	kubelet: cgroups: be verbose about validation Previously, callers of `Exists()` would not know why the cGroup was or was not existing. In one call-site in particular, the `kubelet` would entirely fail to start if the cGroup validation did not succeed. In these cases we MUST explain what went wrong and pass that information clearly to the caller. Previously, some but not all of the reasons for invalidation were logged at a low log-level instead. This led to poor UX. The original method was retained on the interface so as to make this diff small. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>	2022-03-10 07:25:33 -08:00
Kubernetes Prow Robot	98ada45442	Merge pull request #108402 from Shoothzj/fix-typo-in-watch_based_manager_test Fix typo in watch_based_manager_test	2022-03-08 20:04:21 -08:00
Kir Kolyshkin	de5a69d847	pkg/kubelet/cm: fix potential nil dereference in enforceExistingCgroup Move the rl == nil check to before we dereference it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-08 17:05:46 -08:00
Kir Kolyshkin	9652d0cedc	pkg/kubelet/cm: move common code to libctCgroupConfig Instead of doing (almost) the same thing from the three different methods (Create, Update, Destroy), move the functionality to libctCgroupConfig, replacing updateSystemdCgroupInfo. The needResources bool is needed because we do not need resources during Destroy, so we skip the unneeded resource conversion. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-08 17:05:46 -08:00
Kir Kolyshkin	11b0d57c93	pkg/kubelet/cm/cgroup_manager: simplify setting hugetlb Commit `79be8be10e` made hugetlb settings optional if cgroup v2 is used and hugetlb is not available, fixing issue 92933. Note at that time this was only needed for v2, because for v1 the resources were set one-by-one, and only for supported resources. Commit `d312ef7eb6` switched the code to using Set from runc/libcontainer cgroups manager, and expanded the check to cgroup v1 as well. Move this check earlier, to inside m.toResources, so instead of converting all hugetlb resources from ResourceConfig to libcontainers's Resources.HugetlbLimit, and then setting it to nil, we can skip the conversion entirely if hugetlb is not supported, thus not doing the work that is not needed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-08 17:05:46 -08:00
Kir Kolyshkin	59148e22d0	pkg/kubelet/cm: rm dup code Commit `ecd6361f` added setting PidsLimit to Create and Update. Commit `bce9d5f2` added setting PidsLimit to m.toResources. Now, PidsLimit is assigned twice. Remove the duplicate. Fixes: `bce9d5f2` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-08 17:05:46 -08:00
Kir Kolyshkin	a673b64864	kubelet/cm: speed up cgroup creation There's no need to call m.Update (which will create another instance of libcontainer cgroup manager, convert all the resources and then set them). All this is already done here, except for Set(). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-08 17:05:46 -08:00
Kubernetes Prow Robot	29ed12e76b	Merge pull request #108527 from ddebroy/instrumentedgc1 Pass instrumented runtime service to containerGC	2022-03-08 10:24:49 -08:00
Deep Debroy	023d6fb8f4	Pass instrumented runtime service to containergc Signed-off-by: Deep Debroy <ddebroy@gmail.com>	2022-03-08 14:33:37 +00:00
Tim Allclair	e1069c6495	Don't follow redirects with spdy	2022-03-04 16:08:58 -08:00
Kubernetes Prow Robot	5d6ef39406	Merge pull request #96004 from serathius/datapolicy-kubelet-pkg Add datapolicy tags to pkg/kubelet/	2022-03-04 15:34:51 -08:00
Kubernetes Prow Robot	422001df8b	Merge pull request #108154 from klueska/fix-topology-manager Update TopologyManager algorithm for selecting "best" non-preferred hint	2022-03-02 04:13:13 -08:00
Kubernetes Prow Robot	604ab4fc6c	Merge pull request #108340 from ArangoGutierrez/misspelled/1 Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world	2022-03-01 15:45:55 -08:00
Kubernetes Prow Robot	5d6a793221	Merge pull request #96828 from panjf2000/opt-epoll-eventfd kubelet/eviction: eliminate redundant allocations when handling eventfd	2022-03-01 13:59:54 -08:00
Kubernetes Prow Robot	0e8e307567	Merge pull request #106570 from odinuge/fix-cpu-shares-on-big-systems Fix cpu share issues on systems with large amounts of cpu	2022-03-01 10:15:55 -08:00
Kevin Klues	e370b7335c	Add extensive unit testing for TopologyManager hint generation algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-03-01 17:30:24 +00:00
Kevin Klues	99c57828ce	Update TopologyManager algorithm for selecting "best" non-preferred hint For the 'single-numa' and 'restricted' TopologyManager policies, pods are only admitted if all of their containers have perfect alignment across the set of resources they are requesting. The best-effort policy, on the other hand, will prefer allocations that have perfect alignment, but fall back to a non-preferred alignment if perfect alignment can't be achieved. The existing algorithm of how to choose the best hint from the set of "non-preferred" hints is fairly naive and often results in choosing a sub-optimal hint. It works fine in cases where all resources would end up coming from a single NUMA node (even if its not the same NUMA nodes), but breaks down as soon as multiple NUMA nodes are required for the "best" alignment. We will never be able to achieve perfect alignment with these non-preferred hints, but we should try and do something more intelligent than simply choosing the hint with the narrowest mask. In an ideal world, we would have the TopologyManager return a set of "resources-relative" hints (as opposed to a common hint for all resources as is done today). Each resource-relative hint would indicate how many other resources could be aligned to it on a given NUMA node, and a hint provider would use this information to allocate its resources in the most aligned way possible. There are likely some edge cases to consider here, but such an algorithm would allow us to do partial-perfect-alignment of "some" resources, even if all resources could not be perfectly aligned. Unfortunately, supporting something like this would require a major redesign to how the TopologyManager interacts with its hint providers (as well as how those hint providers make decisions based on the hints they get back). That said, we can still do better than the naive algorithm we have today, and this patch provides a mechanism to do so. We start by looking at the set of hints passed into the TopologyManager for each resource and generate a list of the minimum number of NUMA nodes required to satisfy an allocation for a given resource. Each entry in this list then contains the 'minNUMAAffinity.Count()' for a given resources. Once we have this list, we find the maximum 'minNUMAAffinity.Count()' from the list and mark that as the 'bestNonPreferredAffinityCount' that we would like to have associated with whatever "bestHint" we ultimately generate. The intuition being that we would like to (at the very least) get alignment for those resources that require multiple NUMA nodes to satisfy their allocation. If we can't quite get there, then we should try to come as close to it as possible. Once we have this 'bestNonPreferredAffinityCount', the algorithm proceeds as follows: If the mergedHint and bestHint are both non-preferred, then try and find a hint whose affinity count is as close to (but not higher than) the bestNonPreferredAffinityCount as possible. To do this we need to consider the following cases and react accordingly: 1. bestHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 2. bestHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3. bestHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (1), the current bestHint is larger than the bestNonPreferredAffinityCount, so updating to any narrower mergeHint is preferred over staying where we are. For case (2), the current bestHint is equal to the bestNonPreferredAffinityCount, so we would like to stick with what we have unless the current mergedHint is also equal to bestNonPreferredAffinityCount and it is narrower. For case (3), the current bestHint is less than bestNonPreferredAffinityCount, so we would like to creep back up to bestNonPreferredAffinityCount as close as we can. There are three cases to consider here: 3a. mergedHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 3b. mergedHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3c. mergedHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (3a), we just want to stick with the current bestHint because choosing a new hint that is greater than bestNonPreferredAffinityCount would be counter-productive. For case (3b), we want to immediately update bestHint to the current mergedHint, making it now equal to bestNonPreferredAffinityCount. For case (3c), we know that both the current bestHint and the current mergedHint are less than bestNonPreferredAffinityCount, so we want to choose one that brings us back up as close to bestNonPreferredAffinityCount as possible. There are three cases to consider here: 3ca. mergedHint.NUMANodeAffinity.Count() > bestHint.NUMANodeAffinity.Count() 3cb. mergedHint.NUMANodeAffinity.Count() < bestHint.NUMANodeAffinity.Count() 3cc. mergedHint.NUMANodeAffinity.Count() == bestHint.NUMANodeAffinity.Count() For case (3ca), we want to immediately update bestHint to mergedHint because that will bring us closer to the (higher) value of bestNonPreferredAffinityCount. For case (3cb), we want to stick with the current bestHint because choosing the current mergedHint would strictly move us further away from the bestNonPreferredAffinityCount. Finally, for case (3cc), we know that the current bestHint and the current mergedHint are equal, so we simply choose the narrower of the 2. This patch implements this algorithm for the case where we must choose from a set of non-preferred hints and provides a set of unit-tests to verify its correctness. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-03-01 14:38:26 +00:00
ZhangJian He	d09947a5b5	Fix typo in watch_based_manager_test	2022-03-01 10:21:17 +08:00
Kubernetes Prow Robot	bef9d807a0	Merge pull request #108325 from pacoxu/donotReturnErrWhenPauseLose do not return err when PodSandbox not exist	2022-02-28 18:15:46 -08:00
Kubernetes Prow Robot	e9ba9dc4e4	Merge pull request #107201 from pacoxu/add-metrics-volume-stats-cal add VolumeStatCalDuration metrics for fsquato monitoring benchmark	2022-02-28 16:07:46 -08:00
Kevin Klues	f8601cb5a3	Refactor TopologyManager to be more explicit about bestHint calculation Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-02-28 20:30:01 +00:00
houjun	df099ed923	Fix error logging statement to make it easier to understand	2022-02-26 15:25:56 +08:00
Carlos Eduardo Arango Gutierrez	bbb8ef1d10	Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>	2022-02-24 16:20:24 -05:00
Kubernetes Prow Robot	06e107081e	Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`	2022-02-24 07:56:30 -08:00
jonyhy96	60cd896602	fix: pod worker test Signed-off-by: jonyhy96 <hy352144278@gmail.com>	2022-02-24 16:35:33 +08:00
chenyw1990	e26df3594c	do not return err when PodSandbox not exist Co-authored-by: pacoxu <paco.xu@daocloud.io>	2022-02-24 14:58:39 +08:00
Kubernetes Prow Robot	08c31088c1	Merge pull request #106858 from cmssczy/add_RegisterWithTaints_validation_test add kubelet config validation test for RegisterWithTaints	2022-02-23 12:51:58 -08:00
Kubernetes Prow Robot	eacbf87bfe	Merge pull request #108156 from jsafrane/rename-selinuxsupport Rename SupportsSELinux to SELinuxRelabel	2022-02-22 20:12:20 -08:00

... 26 27 28 29 30 ...

11283 Commits