kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	422001df8b	Merge pull request #108154 from klueska/fix-topology-manager Update TopologyManager algorithm for selecting "best" non-preferred hint	2022-03-02 04:13:13 -08:00
Kubernetes Prow Robot	604ab4fc6c	Merge pull request #108340 from ArangoGutierrez/misspelled/1 Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world	2022-03-01 15:45:55 -08:00
Kubernetes Prow Robot	5d6a793221	Merge pull request #96828 from panjf2000/opt-epoll-eventfd kubelet/eviction: eliminate redundant allocations when handling eventfd	2022-03-01 13:59:54 -08:00
Kubernetes Prow Robot	0e8e307567	Merge pull request #106570 from odinuge/fix-cpu-shares-on-big-systems Fix cpu share issues on systems with large amounts of cpu	2022-03-01 10:15:55 -08:00
Kevin Klues	e370b7335c	Add extensive unit testing for TopologyManager hint generation algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-03-01 17:30:24 +00:00
Kevin Klues	99c57828ce	Update TopologyManager algorithm for selecting "best" non-preferred hint For the 'single-numa' and 'restricted' TopologyManager policies, pods are only admitted if all of their containers have perfect alignment across the set of resources they are requesting. The best-effort policy, on the other hand, will prefer allocations that have perfect alignment, but fall back to a non-preferred alignment if perfect alignment can't be achieved. The existing algorithm of how to choose the best hint from the set of "non-preferred" hints is fairly naive and often results in choosing a sub-optimal hint. It works fine in cases where all resources would end up coming from a single NUMA node (even if its not the same NUMA nodes), but breaks down as soon as multiple NUMA nodes are required for the "best" alignment. We will never be able to achieve perfect alignment with these non-preferred hints, but we should try and do something more intelligent than simply choosing the hint with the narrowest mask. In an ideal world, we would have the TopologyManager return a set of "resources-relative" hints (as opposed to a common hint for all resources as is done today). Each resource-relative hint would indicate how many other resources could be aligned to it on a given NUMA node, and a hint provider would use this information to allocate its resources in the most aligned way possible. There are likely some edge cases to consider here, but such an algorithm would allow us to do partial-perfect-alignment of "some" resources, even if all resources could not be perfectly aligned. Unfortunately, supporting something like this would require a major redesign to how the TopologyManager interacts with its hint providers (as well as how those hint providers make decisions based on the hints they get back). That said, we can still do better than the naive algorithm we have today, and this patch provides a mechanism to do so. We start by looking at the set of hints passed into the TopologyManager for each resource and generate a list of the minimum number of NUMA nodes required to satisfy an allocation for a given resource. Each entry in this list then contains the 'minNUMAAffinity.Count()' for a given resources. Once we have this list, we find the maximum 'minNUMAAffinity.Count()' from the list and mark that as the 'bestNonPreferredAffinityCount' that we would like to have associated with whatever "bestHint" we ultimately generate. The intuition being that we would like to (at the very least) get alignment for those resources that require multiple NUMA nodes to satisfy their allocation. If we can't quite get there, then we should try to come as close to it as possible. Once we have this 'bestNonPreferredAffinityCount', the algorithm proceeds as follows: If the mergedHint and bestHint are both non-preferred, then try and find a hint whose affinity count is as close to (but not higher than) the bestNonPreferredAffinityCount as possible. To do this we need to consider the following cases and react accordingly: 1. bestHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 2. bestHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3. bestHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (1), the current bestHint is larger than the bestNonPreferredAffinityCount, so updating to any narrower mergeHint is preferred over staying where we are. For case (2), the current bestHint is equal to the bestNonPreferredAffinityCount, so we would like to stick with what we have unless the current mergedHint is also equal to bestNonPreferredAffinityCount and it is narrower. For case (3), the current bestHint is less than bestNonPreferredAffinityCount, so we would like to creep back up to bestNonPreferredAffinityCount as close as we can. There are three cases to consider here: 3a. mergedHint.NUMANodeAffinity.Count() > bestNonPreferredAffinityCount 3b. mergedHint.NUMANodeAffinity.Count() == bestNonPreferredAffinityCount 3c. mergedHint.NUMANodeAffinity.Count() < bestNonPreferredAffinityCount For case (3a), we just want to stick with the current bestHint because choosing a new hint that is greater than bestNonPreferredAffinityCount would be counter-productive. For case (3b), we want to immediately update bestHint to the current mergedHint, making it now equal to bestNonPreferredAffinityCount. For case (3c), we know that both the current bestHint and the current mergedHint are less than bestNonPreferredAffinityCount, so we want to choose one that brings us back up as close to bestNonPreferredAffinityCount as possible. There are three cases to consider here: 3ca. mergedHint.NUMANodeAffinity.Count() > bestHint.NUMANodeAffinity.Count() 3cb. mergedHint.NUMANodeAffinity.Count() < bestHint.NUMANodeAffinity.Count() 3cc. mergedHint.NUMANodeAffinity.Count() == bestHint.NUMANodeAffinity.Count() For case (3ca), we want to immediately update bestHint to mergedHint because that will bring us closer to the (higher) value of bestNonPreferredAffinityCount. For case (3cb), we want to stick with the current bestHint because choosing the current mergedHint would strictly move us further away from the bestNonPreferredAffinityCount. Finally, for case (3cc), we know that the current bestHint and the current mergedHint are equal, so we simply choose the narrower of the 2. This patch implements this algorithm for the case where we must choose from a set of non-preferred hints and provides a set of unit-tests to verify its correctness. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-03-01 14:38:26 +00:00
ZhangJian He	d09947a5b5	Fix typo in watch_based_manager_test	2022-03-01 10:21:17 +08:00
Kubernetes Prow Robot	bef9d807a0	Merge pull request #108325 from pacoxu/donotReturnErrWhenPauseLose do not return err when PodSandbox not exist	2022-02-28 18:15:46 -08:00
Kubernetes Prow Robot	e9ba9dc4e4	Merge pull request #107201 from pacoxu/add-metrics-volume-stats-cal add VolumeStatCalDuration metrics for fsquato monitoring benchmark	2022-02-28 16:07:46 -08:00
Kevin Klues	f8601cb5a3	Refactor TopologyManager to be more explicit about bestHint calculation Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-02-28 20:30:01 +00:00
houjun	df099ed923	Fix error logging statement to make it easier to understand	2022-02-26 15:25:56 +08:00
Carlos Eduardo Arango Gutierrez	bbb8ef1d10	Fix typo in pkg/kubelet/pluginmanager/cache/actual_state_of_world Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>	2022-02-24 16:20:24 -05:00
Kubernetes Prow Robot	06e107081e	Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`	2022-02-24 07:56:30 -08:00
jonyhy96	60cd896602	fix: pod worker test Signed-off-by: jonyhy96 <hy352144278@gmail.com>	2022-02-24 16:35:33 +08:00
chenyw1990	e26df3594c	do not return err when PodSandbox not exist Co-authored-by: pacoxu <paco.xu@daocloud.io>	2022-02-24 14:58:39 +08:00
Kubernetes Prow Robot	08c31088c1	Merge pull request #106858 from cmssczy/add_RegisterWithTaints_validation_test add kubelet config validation test for RegisterWithTaints	2022-02-23 12:51:58 -08:00
Kubernetes Prow Robot	eacbf87bfe	Merge pull request #108156 from jsafrane/rename-selinuxsupport Rename SupportsSELinux to SELinuxRelabel	2022-02-22 20:12:20 -08:00
utkarsh348	eaee96efd3	Fixed race condition test manager shutdown	2022-02-18 11:20:02 +05:30
Kubernetes Prow Robot	2d2a7272fc	Merge pull request #107670 from 249043822/br-notfound Suppress container not found errors in container runtime getPodStatuses	2022-02-16 10:00:37 -08:00
Jan Safranek	525b8e5cd6	Rename SupportsSELinux to SELinuxRelabel The field in fact says that the container runtime should relabel a volume when running a container with it, it does not say that the volume supports SELinux. For example, NFS can support SELinux, but we don't want NFS volumes relabeled, because they can be shared among several Pods.	2022-02-16 10:54:08 +01:00
KeZhang	3946d99904	Ignore container notfound error while getPodstatuses	2022-02-16 08:55:19 +08:00
Peter Hunt	0b7629d2cc	kubelet/stats: add unit test for when container logs are found Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-02-15 16:34:54 -05:00
Peter Hunt	1c3357db76	kubelet/stats: take container log stats into account when checking ephemeral stats this commit updates checkEphemeralStorage to be able to add container log stats, if applicable. It also updates the old check when container log stats aren't found to be more accurate. Specifically, this check previously worked because of a fluke programming accident: according to this block in pkg/kubelet/stats/helper.go:113 ``` if result.Rootfs != nil { rootfsUsage := *cfs.BaseUsageBytes result.Rootfs.UsedBytes = &rootfsUsage } ``` BaseUsageBytes should be the value added, not TotalUsageBytes. However, since in this case one also needs to account for the calculated log size, which is TotalUsageBytes - BaseUsageBytes using TotalUsageBytes value accidentally worked. Updating the case to use the correct value AND log offset fixes this accident and makes the behavior more in line with what happens when calculating ephemeral storage. Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-02-15 16:30:25 -05:00
Kubernetes Prow Robot	efa5692c0b	Merge pull request #108045 from hakman/deprecate_pod-infra-container-image Mark pod-infra-container-image flag as deprecated	2022-02-15 13:17:19 -08:00
Peter Hunt	ab0f274a6f	kubelet/stats: update cadvisor stats provider with new log location in https://github.com/kubernetes/kubernetes/pull/74441, the namespace and name were added to the pod log location. However, cAdvisor stats provider wasn't correspondingly updated. since CRI-O uses cAdvisor stats provider by default, despite being a CRI implementation, eviction with ephemeral storage and container logs doesn't work as expected, until now! Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-02-15 16:04:16 -05:00
Kubernetes Prow Robot	64e83a7e43	Merge pull request #107945 from saschagrunert/cri-verbose Add support for CRI `verbose` fields	2022-02-14 17:58:12 -08:00
Ciprian Hacman	57638ae7a1	Mark pod-infra-container-image flag as deprecated Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-14 09:11:51 +02:00
Matthias Bertschy	9500ee9d9c	container_manager: use oomScoreAdj instead of default when set	2022-02-12 15:23:13 +01:00
Kubernetes Prow Robot	1659924a97	Merge pull request #108070 from jsafrane/remove-selinux Remove util/selinux package	2022-02-11 18:19:47 -08:00
Kubernetes Prow Robot	8580bbf7d7	Merge pull request #107594 from hakman/remove_container-runtime_logic Clean up logic for deprecated flag --container-runtime in kubelet	2022-02-11 12:57:47 -08:00
Kubernetes Prow Robot	e24b5333e5	Merge pull request #108052 from klueska/fix-topology-manager Fix bug in TopologyManager with merging hints when NUM_NUMA > 2	2022-02-11 07:37:34 -08:00
Jan Safranek	77aa06d0c8	Remove util/selinux package The package says: > the libcontainer SELinux package is only built for Linux, so it is > necessary to have a NOP wrapper which is built for non-Linux platforms This is not true, Kubernetes now imports github.com/opencontainers/selinux/go-selinux and it has proper multiplatform support (i.e. NOOP on non-Linux platforms). Removing the whole package and calling go-selinux directly.	2022-02-11 15:20:35 +01:00
Kubernetes Prow Robot	7cfe0ca828	Merge pull request #107774 from calvin0327/fix-data-race fix: data race when hijack klog	2022-02-10 23:32:15 -08:00
Cheng Xing	b152fa9b6c	Remove verult from OWNERS files	2022-02-10 18:25:38 -08:00
Kevin Klues	155562dd2e	Fix bug in TopologyManager with merging hints when NUM_NUMA > 2 Before this fix, hint permutations such as: permutation: [{11 true} {0101 true}] Could result in merged hints of: mergedHint: {01 true} This was possible because both hints in the permutation container a "preferred" allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are required to satisfy the allocation). With this in place, the simplified logic we had simply kept the merged hint as preferred as well. However, what we really want is to ensure that the merged hint is only preferred if true alignment of all resources is possible (i.e. if all hints in the permutation are preferred AND their affinities are exactly equal). The only exception to this is if no topology information is provided by a given hint provider. In this case, we assume alignment doesn't matter and only consider the resources that actually have hints provided for them. This changes the semantics of permutations of the form: permutation: [{111 true} {011 true}] To now result in the merged hint of: mergedHint: {011 false} Instead of: mergedHint: {011 true} This is arguably how it should always have been though (because a hint should not be preferred if true alignment isn't possible), and two tests have had to change to accomodate these new semantics. This commit changes the merge function to implement the updated logic, adds a test to verify it is functioning correctly, and updates the two tests mentioned above to adjust to the new semantics. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-02-10 22:07:51 +00:00
Sascha Grunert	effbcd3a0a	Add support for CRI `verbose` fields The remote runtime implementation now supports the `verbose` fields, which are required for consumers like cri-tools to enable multi CRI version support. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-02-10 17:12:26 +01:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
Kubernetes Prow Robot	3b4a9cdfff	Merge pull request #108007 from endocrimes/dani/cm-remove-docker cm: Remove legacy docker references	2022-02-10 03:23:47 -08:00
Gunju Kim	eb4cd9ab4e	Check taint/toleration before accepting pods, except for static pods	2022-02-10 19:39:26 +09:00
Kubernetes Prow Robot	518a3c2f70	Merge pull request #107108 from linxiulei/fix_pid Read number of running processes from /proc/loadavg.	2022-02-10 01:15:47 -08:00
Kubernetes Prow Robot	40c2d04946	Merge pull request #107112 from linxiulei/fix_pidmax Consider threads-max when deciding MaxPID.	2022-02-09 20:49:45 -08:00
Kubernetes Prow Robot	0dcd6eaa0d	Merge pull request #103934 from boenn/tainttoleration De-duplicate predicate (known as filter now) logic shared in kubelet and scheduler	2022-02-09 16:53:46 -08:00
Kubernetes Prow Robot	8d01b02c60	Merge pull request #107096 from hakman/remove_non-masquerade-cidr Remove deprecated flag --non-masquerade-cidr in kubelet	2022-02-08 12:42:50 -08:00
Danielle Lancashire	3630328fd9	eviction: Deflake TestStart TestStart was previously flaky. In approx 100_000 local runs, it failed about 70% of the time, and has been mentioned as a flaky unit test in the past. This flake was due to a race condition with the logic as written and the go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go Routine, which is not guarunteed to be called immediately. This meant that if `m.Start()` was called before `notifier.Start()`, the test would fail, as the notifier would not have been started before the 4 events were processed and lock released. Here, we update the test to more closely match the intended application behaviour, and have events passed to the channel when `Start` is called on the notifier. This ensures that -Start gets called and additionally validates that the correct channel is provided to the notifier. Stop was never called previously, as it only gets called on a subsequent call to UpdateThreshold. `AnyTimes()` hid that this did not occur.	2022-02-08 17:03:44 +01:00
Danielle Lancashire	c198062da4	cm: Remove legacy docker references Dockershim and built-in Docker support are gone. Cleans up dead code references to them.	2022-02-08 16:25:04 +01:00
Jorik Jonker	27b8f13763	kubelet: expose OOM metrics cAdvisor has code to expose OOM metrics since 0.40.0, but this was not included in Kubelet so far. This commit enables it. Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>	2022-02-08 12:24:25 +01:00
Jordan Liggitt	3a132bd206	Fix kubelet cri round trip test	2022-02-05 17:59:29 -05:00
Kubernetes Prow Robot	469c4c4a30	Merge pull request #106715 from aojea/dual_hostnet_pods set secondary address on host-network pods	2022-02-04 12:17:30 -08:00
Antonio Ojea	bc8e7ac1a0	ignore CRI PodSandboxNetworkStatus for host network pods	2022-02-04 18:41:57 +01:00
Gunju Kim	3ce5c944a8	kubelet: Clean up a static pod that has been terminated before starting - Allow a podWorker to start if it is blocked by a pod that has been terminated before starting - When a pod can't start AND has already been terminated, exit cleanly - Add a unit test that exercises race conditions in pod workers	2022-02-02 16:05:32 -05:00

... 15 16 17 18 19 ...

10700 Commits