kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	54ec651ab5	Merge pull request #110741 from zhoumingcheng/master-unit-v1 add unit test coverage for pkg/kubelet/util/queue	2023-03-09 11:15:51 -08:00
Kubernetes Prow Robot	625b8be09e	Merge pull request #115371 from pacoxu/cgroup-v2-memory-tuning default memoryThrottlingFactor to 0.9 and optimize the memory.high formulas	2023-03-08 18:46:00 -08:00
Kubernetes Prow Robot	8d5c96fed2	Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation node: topologymgr: Graduate Kubelet Topology Manager to GA	2023-03-08 16:56:06 -08:00
Paco Xu	f368413d65	sync default qps of kubelet change	2023-03-08 14:04:51 +08:00
Kubernetes Prow Robot	e390791e5f	Merge pull request #116341 from bobbypage/revert-114640-handle-device-mgr-recovery Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"	2023-03-07 19:31:33 -08:00
Kubernetes Prow Robot	fe6a51ed4c	Merge pull request #116121 from wojtek-t/bump_qps_kubelet Bump default API QPS limits for Kubelet	2023-03-07 15:08:43 -08:00
Kubernetes Prow Robot	6bce018b36	Merge pull request #116271 from vinaykul/restart-free-pod-vertical-scaling-kubelet-panic-fix Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario	2023-03-07 12:38:45 -08:00
David Porter	9c20cee504	Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"	2023-03-07 11:50:52 -08:00
Kubernetes Prow Robot	2c8f63f693	Merge pull request #115268 from jsafrane/split-reconstruction Split volume reconstruction refactoring from SELinuxMountReadWriteOncePod	2023-03-07 10:44:34 -08:00
Swati Sehgal	ae964a493f	node: topologymgr: remove comments with feature gate references Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-07 09:42:54 +00:00
vinay kulkarni	98e8f42f33	panic on pod resources alloc checkpoint failure	2023-03-07 05:59:34 +00:00
Kubernetes Prow Robot	8e659d43ec	Merge pull request #115925 from claudiubelu/skip-flaky-tests unit tests: Skip flaky tests on Windows	2023-03-06 21:56:29 -08:00
Kubernetes Prow Robot	44909771d9	Merge pull request #115965 from jsafrane/add-reconstruction-metrics Add volume reconstruction metrics	2023-03-06 14:56:16 -08:00
Claudiu Belu	5ba74c81ca	unit tests: Skip flaky tests on Windows Some of the unit tests are currently flaky on Windows. This commit skips them until they are resolved.	2023-03-06 20:46:05 +00:00
Jan Safranek	9ca548fcf0	Add metrics for force cleaned mounts after failed reconstruction Count nr. of force cleaned mounts + their failures after a volume fails reconstruction.	2023-03-06 17:48:59 +01:00
Kubernetes Prow Robot	d6e9cff212	Merge pull request #115838 from torredil/remove-aws Remove AWS legacy cloud provider + EBS in-tree storage plugin	2023-03-06 08:18:29 -08:00
Kubernetes Prow Robot	890d39f976	Merge pull request #114640 from swatisehgal/handle-device-mgr-recovery node: device-mgr: Handle recovery flow by checking if healthy devices exist	2023-03-06 07:10:28 -08:00
Kubernetes Prow Robot	68eea2468c	Merge pull request #114572 from huyinhou/fix-concurrent-map-access kubelet/deviceplugin: fix concurrent map iteration and map write	2023-03-06 06:06:29 -08:00
torredil	6aebda9b1e	Remove AWS legacy cloud provider + EBS in-tree storage plugin Signed-off-by: torredil <torredil@amazon.com>	2023-03-06 14:01:15 +00:00
Swati Sehgal	937d330393	node: topologymgr: Remove ResourceAllocator as TM is always enabled With Topology Manager enabled by default, we no longer need `resourceAllocator` as Topology Manager serves as the main PodAdmitHandler completely responsible for admission check based on hints received from the hintProviders and the subsequent allocation of the corresponding resources to a pod as can be seen here: https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/topologymanager/scope.go#L150 With regard to DRA, the passing of `cm.draManager` into resourceAllocator seems redundant as no admission checks (and allocation of resources handled by DRA) is taking place in `Admit` method of resourceAllocator. DRA has a completely different model to the rest of the resource managers where pod is only scheduled on a node once resources are reserved for it. Because of this, admission checks or waiting for resources to be provisioned after the pod has been scheduled on the node is not required. Before making the above change, it was verified that DRA Manager is instantiated in `NewContainerManager`: https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/container_manager_linux.go#L318 Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:51:11 +00:00
Swati Sehgal	6a62f0236a	node: topologymgr: trivial internal variable renaming Since Topology manager is graduating to GA, we remove internal configuration variable names with `Experimental` prefix. There is no expected change in behavior, only trival variable renaming. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:51:11 +00:00
Swati Sehgal	d536a342b4	node: topologymgr: GA graduation implies Feature Gate is ON by default Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 12:51:05 +00:00
Swati Sehgal	5b2a3dbbdc	node: device-mgr: explicitly check if pre-allocated devices are healthy Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Swati Sehgal	a799ffb571	node: device-mgr: unit-tests: admission failure due to unhealthy devices Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Swati Sehgal	7ac399c205	node: device-mgr: Handle recovery by checking if healthy devices exist In case of node reboot/kubelet restart, the flow of events involves obtaining the state from the checkpoint file followed by setting the `healthDevices`/`unhealthyDevices` to its zero value. This is done to allow the device plugin to re-register itself so that capacity can be updated appropriately. During the allocation phase, we need to check if the resources requested by the pod have been registered AND healthy devices are present on the node to be allocated. Also we need to move this check above `needed==0` where needed is required - devices allocated to the container (which is obtained from the checkpoint file) because even in cases where no additional devices have to be allocated (as they were pre-allocated), we still need to make the devices that were previously allocated are healthy. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Wojciech Tyczyński	280651abcc	Autogenerated	2023-03-06 12:08:34 +01:00
Wojciech Tyczyński	760acbbbe3	Bump QPS limits for Kubelet	2023-03-06 12:07:52 +01:00
Kubernetes Prow Robot	b8aaaf380a	Merge pull request #116083 from SataQiu/clean-20230227 kubelet: remove unused DockerID type	2023-03-06 02:22:58 -08:00
vinay kulkarni	b0dce923f1	Add Get interfaces for container's checkpointed ResourcesAllocated and Resize values, remove error logging for valid standalone kubelet scenario	2023-03-06 09:50:12 +00:00
huyinhou	88274d96fc	update code style Signed-off-by: huyinhou <huyinhou@bytedance.com>	2023-03-06 14:23:14 +08:00
vinay kulkarni	12435b26fc	Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario	2023-03-04 08:07:40 +00:00
Sergey Kanzhelev	04189b1fc4	rename ExperimentalPodPidsLimit to PodPidsLimit	2023-03-04 01:48:16 +00:00
Paco Xu	81c5a122c3	add pageSize to memory.high formula	2023-03-03 11:24:50 +08:00
Paco Xu	7dab6253e1	default memoryThrottlingFactor to 0.9 and optimize the memory.high calculation formulas	2023-03-03 11:24:40 +08:00
Sergey Kanzhelev	e360de48b2	GRPCContainerProbe is GA	2023-03-02 22:07:59 +00:00
Kubernetes Prow Robot	57fd02ca29	Merge pull request #116218 from pohly/test-lease-controller-leak update lease controller	2023-03-02 10:30:56 -08:00
Kubernetes Prow Robot	efe20f6c9b	Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537 node: cpumgr: stricter pre-check for the policy option full-pcpus-only	2023-03-02 09:04:56 -08:00
Francesco Romani	0e9b92090c	node: cpumgr: stricter precheck for full-pcpus-only In order to implement the `full-pcpus-only` cpumanager policy option, we leverage the implementation of the algorithm which picks CPUs. By design, CPUs are taken from the biggest chunk available (socket or NUMA zone) to physical cores, down to single cores. Leveraging this, if the requested CPU count is a multiple of the SMT level (commonly 2), we're guaranteed that only full physical cores will be taken. The hidden assumption here is this holds true by construction iff the user reserved CPUs (if any) considering full physical CPUs. IOW, if the user did intentionally or mistakely reserve single threads which are no core siblings[1], then the simple check we implemented is not sufficient. A easy example can probably outline this better. With this setup: cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings). SMT level: 2 (each tuple is 2 elements) Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`) A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed. The CPU allocator will take first full cores, (2,6) and (3,8), and will then pick the remaining single CPUs. The allocation will succeed, but it's incorrect. We can fix this case with a stricter precheck. We need to additionally consider all the core siblings of the reserved CPUs as unavailable when computing the free cpus, before to start the actual allocation. Doing so, we fall back in the intended behavior, and by construction all possible CPUs allocation whose number is multiple of the SMT level are now correct again. +++ [1] or thread siblings in the linux parlance, in any case: hyperthread siblings of the same physical core Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-03-02 16:00:58 +01:00
Patrick Ohly	dad95e1be6	update lease controller Passing in a context instead of a stop channel has several advantages: - ensures that client-go calls return as soon as the controller is asked to stop - contextual logging can be used By passing that context down to its own functions and checking it while waiting, the lease controller also doesn't get stuck in backoffEnsureLease anymore (https://github.com/kubernetes/kubernetes/issues/116196).	2023-03-02 15:06:00 +01:00
ruiwen-zhao	572e6e0ffb	Add MaxParallelImagePulls support Signed-off-by: ruiwen-zhao <ruiwen@google.com>	2023-03-02 03:57:59 +00:00
Kubernetes Prow Robot	53f3583c7f	Merge pull request #114785 from TommyStarK/kubelet/replace-deprecated-pointer-function kubelet: Replace deprecated pointer function	2023-03-01 18:04:55 -08:00
Patrick Ohly	961819a4d0	dependencies: update klog v2.90.1 This improves performance of the text formatting and ktesting. Because ktesting no longer buffers messages by default, one unit test needs to ask for that explicitly.	2023-03-01 19:03:50 +01:00
Kubernetes Prow Robot	6a25c528bb	Merge pull request #115891 from bart0sh/PR103-CRI-add-CDI-devices DRA: Pass CDI devices with a new CRI field	2023-02-28 14:53:28 -08:00
Kubernetes Prow Robot	18eea58ac2	Merge pull request #115359 from iancoolidge/devel-cpuset More code-review changes from k/utlils cpuset review	2023-02-28 10:55:16 -08:00
Ed Bartosh	5a86895070	DRA: pass CDI devices through CRI CDIDevice field	2023-02-28 19:21:20 +02:00
SataQiu	ed2caf17e0	kubelet: remove unused DockerID type	2023-02-27 16:02:59 +08:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Jan Safranek	bd73aee9db	Add volume reconstruction metrics Count nr. of volumes that kubelet tried to reconstruct + reconstruction errors.	2023-02-22 13:01:26 +01:00
Ian K. Coolidge	d4a1bf83c1	cpuset: Convert Fatalf to Errrof in tests Use of Fatalf is not apppropriate in any of these cases: None of these failures are prerequisites.	2023-02-21 05:41:16 +00:00

1 2 3 4 5 ...

10507 Commits