kubernetes

Author	SHA1	Message	Date
Patrick Ohly	3948cb8d1b	component-base: move v/vmodule/log-flush-frequency into LoggingConfiguration These three options are the ones from logs.AddFlags which are not deprecated. Therefore it makes sense to make them available also via the configuration file support in the one command which currently supports that (kubelet). Long-term, all commands should use LoggingConfiguration, either with a configuration file (as in kubelet) or via flags (kube-scheduler, kube-apiserver, kube-controller-manager). Short-term, both approaches have to be supported. As the majority of the commands only use logs.AddFlags, that function by default continues to register the flags and only leaves that to Options.AddFlags when explicitly requested. A drive-by bug fix is done for log flushing: the periodic flushing called klog.Flush and therefore missed explicit flushing of the newer logr backend. This bug was never present in any release Kubernetes and therefore the fix is not submitted in a separate PR.	2021-11-03 07:41:46 +01:00
Kubernetes Prow Robot	aa0ea62489	Merge pull request #104903 from ikeeip/storageobjectinuseprotection_feature_ga_cleanup Remove StorageObjectInUseProtection feature gate logic	2021-11-02 20:22:57 -07:00
Kubernetes Prow Robot	359b722c19	Merge pull request #102882 from fromanirh/device-manager-checkpoints devicemanager: checkpoint: support pre-1.20 data	2021-11-02 16:56:57 -07:00
Konstantin Misyutin	808c8f42d5	Remove StorageObjectInUseProtection feature gate logic This feature has graduated to GA in v1.11 and will always be enabled. So no longe need to check if enabled. Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>	2021-11-03 00:13:50 +03:00
Kubernetes Prow Robot	08bf54678e	Merge pull request #101909 from nolancon/cpu-mgr-testing Additional cases for reconcileState testing	2021-10-30 00:01:17 -07:00
Tim Hockin	11a25bfeb6	De-share the Handler struct in core API (#105979 ) * De-share the Handler struct in core API An upcoming PR adds a handler that only applies on one of these paths. Having fields that don't work seems bad. This never should have been shared. Lifecycle hooks are like a "write" while probes are more like a "read". HTTPGet and TCPSocket don't really make sense as lifecycle hooks (but I can't take that back). When we add gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary RPC - so a probe makes sense but a hook does not. In the future I can also see adding lifecycle hooks that don't make sense as probes. E.g. 'sleep' is a common lifecycle request. The only option is `exec`, which requires having a sleep binary in your image. * Run update scripts	2021-10-29 13:15:11 -07:00
Kubernetes Prow Robot	c592bd40f2	Merge pull request #105609 from pohly/generic-ephemeral-volume-ga generic ephemeral volume GA	2021-10-28 17:36:50 -07:00
Francesco Romani	2f426fdba6	devicemanager: checkpoint: support pre-1.20 data The commit `a8b8995ef2` changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot	17da6a2345	Merge pull request #105699 from yuzhiquan/remove-format-pods Remove format.pods func, instead with klog.Kobjs	2021-10-25 15:53:30 -07:00
Eric Ernst	2c0fad1f52	kuberuntime: populate sandbox resources, overhead Populate Resources and Overhead fields which, are now part of LinuxPodSandboxConfig. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	ddcf815d12	kuberuntime: refactor linux resources for better reuse Seperate the CPU/Memory req/limit -> linux resource conversion into its own function for better reuse. Elsewhere in kuberuntime pkg, we will want to leverage this requests/limits to Linux Resource type conversion. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	b1361aed93	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	a73502a0be	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:29:22 -07:00
Kubernetes Prow Robot	b2c4269992	Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them	2021-10-19 11:40:24 -07:00
Kubernetes Prow Robot	1af8a8c026	Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet	2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot	e595d79dfc	Merge pull request #104574 from 249043822/br-repeat-package fix duplicate package import in pod_worker	2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot	5889fb4fbc	Merge pull request #105652 from wzshiming/feat/structure-shutdown-config Refactor to use structure to pass parameters for GracefulNodeShutdown	2021-10-18 14:45:20 -07:00
Kevin Klues	86f9c266bc	Add optimizations to reduce iterations in distributed NUMA algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-18 08:53:25 +00:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	d54445a84d	Generalize the NUMA distribution algorithm to take cpuGroupSize This parameter ensures that CPUs are always allocated in groups of size 'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e. hyperthreads) from the same core are handed out together. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	1436e33642	Add more extensive testing for NUMA distribution algorithm in CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	cf3afb8602	Add 2 distinguishing test cases between the 2 takeByTopology algorithms Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	eb78e2406b	Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager As part of this, pull out all of the existing "TakeByTopology" tests and have them be called by the original TestTakeByTopologyNUMAPacked() as well as the new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will add some tests that should differ between these two algorithms. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	876dd9b078	Added algorithm to CPUManager to distribute CPUs across NUMA nodes Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
Kevin Klues	0e7928edce	Add new CPUManager policy option for "distribute-cpus-across-numa" This commit only adds the option to the policy options framework. A subsequent commit will add the logic to utilize it. The KEP describing this new option can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
yuzhiquanlong	27fe56e916	remove unused import	2021-10-15 18:40:31 +08:00
Francesco Romani	4bae656835	cpumanager: test NUMA node support for CPU assign (2) This batch of tests adds a fake topology on which each numa node has multiple sockets. We didn't find yet a real HW topology in the wild like this, but we need one to fully exercise the code. So, until we find a HW topology, we add a fake one flipping the NUMA/socket config of the existing xeon dual gold 6320. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	547996f3f6	cpumanager: test NUMA node support for CPU assign (1) This batch of tests adds a real topology on which each physical socket has multiple NUMA zones. Taken by a real dual xeon 6320 gold. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	f6ccc4426a	cpumanager: test: use proper subtests The exisiting unit tests where performing subtests without actually using the full features of the testing package (https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks) Update them with fairly minimal changes. The patch is deceptively large because we need to move the code inside a new block. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	15caa134b2	cpumanager: topology: use rich cmp package User the `cmp.Diff` package in the unit tests, moving away from `reflect.DeepEqual`. This gives us a clearer picture of the differences when the tests fail. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Kevin Klues	aff54a0914	Abstract out whether NUMA or Sockets come first in the memory hierarchy This allows us to get rid of the check for determining which one is higher all throughout the code. Now we just check once and instantiate an interface of the appropriate type that makes sure the ordering in the hierarchy is preserved through the appropriate calls. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 10:29:15 +00:00
yuzhiquanlong	be9e1fda5e	remove format pods func, instead with klog.Kobjs	2021-10-15 18:26:02 +08:00
Kevin Klues	17c7e86c6d	Add NUMA support to the CPU assignment algorithm in the CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 08:35:59 +00:00
Shiming Zhang	e47c78a354	Add log for creating node shutdown manager	2021-10-15 11:16:21 +08:00
Shiming Zhang	b468c24e85	Refactor to use structure to pass parameters	2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot	a923852ba0	Merge pull request #105215 from rphillips/add_probe_shutdown kubelet: add probe termination to graceful shutdowns	2021-10-11 21:19:46 -07:00
Patrick Ohly	a8c930ef46	generic ephemeral volume: graduation to GA The feature gate gets locked to "true", with the goal to remove it in two releases. All code now can assume that the feature is enabled. Tests for "feature disabled" are no longer needed and get removed. Some code wasn't using the new helper functions yet. That gets changed while touching those lines.	2021-10-11 20:54:20 +02:00
nolancon	6bbb36df10	Additional cases for reconcileState testing	2021-10-11 16:17:21 +00:00
Kubernetes Prow Robot	dc9c571166	Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats kubelet: also provide filesystem stats for generic ephemeral volumes	2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot	1f2813368e	Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet kubelet: use generic ephemeral volume helper functions	2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot	fb82a0d7eb	Merge pull request #104873 from pohly/json-output-stream JSON output streams	2021-10-10 17:04:37 -07:00
Patrick Ohly	b22263d835	component-base: configurable JSON output This implements the replacement of klog output to different files per level with optionally splitting JSON output into two streams: one for info messages on stdout, one for error messages on stderr. The info messages can get buffered to increase performance. Because stdout and stderr might be merged by the consumer, the info stream gets flushed before writing an error, to ensure that the order of messages is preserved. This also ensures that the following code pattern doesn't leak info messages: klog.ErrorS(err, ...) os.Exit(1) Commands explicitly have to flush before exiting via logs.FlushLogs. Most already do. But buffered info messages can still get lost during an unexpected program termination, therefore buffering is off by default. The new options get added to the v1alpha1 LoggingConfiguration with new command line flags. Because it is an alpha field, changing it inside the v1beta kubelet config should be okay as long as the fields are clearly marked as alpha.	2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot	63f66e6c99	Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta node: graduate CPUManagerPolicyOptions to beta	2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot	2face135c7	Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources Return only isolated cpus in podresources interface	2021-10-08 05:58:58 -07:00
Patrick Ohly	b1ba381ef8	kubelet: also provide filesystem stats for generic ephemeral volumes When checking for a reference to a PVC, the code also needs to consider that a PVC might be referenced indirectly through an ephemeral volume source.	2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot	dd650bd41f	Merge pull request #105527 from rphillips/fixes/filter_terminated_pods kubelet: set terminated podWorker status for terminated pods	2021-10-07 22:19:51 -07:00
Ryan Phillips	0166d446b9	kubelet: set terminated podWorker status for terminated pods	2021-10-07 16:18:59 -05:00
Patrick Ohly	844662e7fa	kubelet: use generic ephemeral volume helper functions The name concatenation and ownership check were originally considered small enough to not warrant dedicated functions, but the intent of the code is more readable with them.	2021-10-07 17:31:54 +02:00
Alexey Perevalov	5d9032007a	Return only isolated cpus in podresources interface Co-Authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-10-07 15:34:08 +01:00

1 2 3 4 5 ...

9650 Commits