Commit Graph

10491 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
6bce018b36
Merge pull request #116271 from vinaykul/restart-free-pod-vertical-scaling-kubelet-panic-fix
Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario
2023-03-07 12:38:45 -08:00
Kubernetes Prow Robot
2c8f63f693
Merge pull request #115268 from jsafrane/split-reconstruction
Split volume reconstruction refactoring from SELinuxMountReadWriteOncePod
2023-03-07 10:44:34 -08:00
vinay kulkarni
98e8f42f33 panic on pod resources alloc checkpoint failure 2023-03-07 05:59:34 +00:00
Kubernetes Prow Robot
8e659d43ec
Merge pull request #115925 from claudiubelu/skip-flaky-tests
unit tests: Skip flaky tests on Windows
2023-03-06 21:56:29 -08:00
Kubernetes Prow Robot
44909771d9
Merge pull request #115965 from jsafrane/add-reconstruction-metrics
Add volume reconstruction metrics
2023-03-06 14:56:16 -08:00
Claudiu Belu
5ba74c81ca unit tests: Skip flaky tests on Windows
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-03-06 20:46:05 +00:00
Jan Safranek
9ca548fcf0 Add metrics for force cleaned mounts after failed reconstruction
Count nr. of force cleaned mounts + their failures after a volume fails
reconstruction.
2023-03-06 17:48:59 +01:00
Kubernetes Prow Robot
d6e9cff212
Merge pull request #115838 from torredil/remove-aws
Remove AWS legacy cloud provider + EBS in-tree storage plugin
2023-03-06 08:18:29 -08:00
Kubernetes Prow Robot
890d39f976
Merge pull request #114640 from swatisehgal/handle-device-mgr-recovery
node: device-mgr: Handle recovery flow by checking if healthy devices exist
2023-03-06 07:10:28 -08:00
Kubernetes Prow Robot
68eea2468c
Merge pull request #114572 from huyinhou/fix-concurrent-map-access
kubelet/deviceplugin: fix concurrent map iteration and map write
2023-03-06 06:06:29 -08:00
torredil
6aebda9b1e Remove AWS legacy cloud provider + EBS in-tree storage plugin
Signed-off-by: torredil <torredil@amazon.com>
2023-03-06 14:01:15 +00:00
Swati Sehgal
5b2a3dbbdc node: device-mgr: explicitly check if pre-allocated devices are healthy
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
a799ffb571 node: device-mgr: unit-tests: admission failure due to unhealthy devices
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
7ac399c205 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Kubernetes Prow Robot
b8aaaf380a
Merge pull request #116083 from SataQiu/clean-20230227
kubelet: remove unused DockerID type
2023-03-06 02:22:58 -08:00
vinay kulkarni
b0dce923f1 Add Get interfaces for container's checkpointed ResourcesAllocated and Resize values, remove error logging for valid standalone kubelet scenario 2023-03-06 09:50:12 +00:00
huyinhou
88274d96fc update code style
Signed-off-by: huyinhou <huyinhou@bytedance.com>
2023-03-06 14:23:14 +08:00
vinay kulkarni
12435b26fc Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario 2023-03-04 08:07:40 +00:00
Sergey Kanzhelev
04189b1fc4 rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
Sergey Kanzhelev
e360de48b2 GRPCContainerProbe is GA 2023-03-02 22:07:59 +00:00
Kubernetes Prow Robot
57fd02ca29
Merge pull request #116218 from pohly/test-lease-controller-leak
update lease controller
2023-03-02 10:30:56 -08:00
Kubernetes Prow Robot
efe20f6c9b
Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537
node: cpumgr: stricter pre-check for  the policy option full-pcpus-only
2023-03-02 09:04:56 -08:00
Francesco Romani
0e9b92090c node: cpumgr: stricter precheck for full-pcpus-only
In order to implement the `full-pcpus-only` cpumanager policy option,
we leverage the implementation of the algorithm which picks CPUs.
By design, CPUs are taken from the biggest chunk available (socket
or NUMA zone) to physical cores, down to single cores.

Leveraging this, if the requested CPU count is a multiple of the SMT
level (commonly 2), we're guaranteed that only full physical cores
will be taken.

The hidden assumption here is this holds true by construction iff
the user reserved CPUs (if any) considering full physical CPUs.
IOW, if the user did intentionally or mistakely reserve single threads
which are no core siblings[1], then the simple check we implemented
is not sufficient.

A easy example can probably outline this better. With this setup:

cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings).
SMT level: 2 (each tuple is 2 elements)
Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`)

A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed.
The CPU allocator will take first full cores, (2,6) and (3,8), and will
then pick the remaining single CPUs. The allocation will succeed, but
it's incorrect.

We can fix this case with a stricter precheck.
We need to additionally consider all the core siblings of the reserved
CPUs as unavailable when computing the free cpus, before to start the
actual allocation. Doing so, we fall back in the intended behavior, and
by construction all possible CPUs allocation whose number is multiple
of the SMT level are now correct again.

+++

[1] or thread siblings in the linux parlance, in any case:
hyperthread siblings of the same physical core

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-02 16:00:58 +01:00
Patrick Ohly
dad95e1be6 update lease controller
Passing in a context instead of a stop channel has several advantages:
- ensures that client-go calls return as soon as the controller is asked to stop
- contextual logging can be used

By passing that context down to its own functions and checking it while
waiting, the lease controller also doesn't get stuck in backoffEnsureLease
anymore (https://github.com/kubernetes/kubernetes/issues/116196).
2023-03-02 15:06:00 +01:00
ruiwen-zhao
572e6e0ffb Add MaxParallelImagePulls support
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-03-02 03:57:59 +00:00
Kubernetes Prow Robot
53f3583c7f
Merge pull request #114785 from TommyStarK/kubelet/replace-deprecated-pointer-function
kubelet: Replace deprecated pointer function
2023-03-01 18:04:55 -08:00
Patrick Ohly
961819a4d0 dependencies: update klog v2.90.1
This improves performance of the text formatting and ktesting.

Because ktesting no longer buffers messages by default, one unit
test needs to ask for that explicitly.
2023-03-01 19:03:50 +01:00
Kubernetes Prow Robot
6a25c528bb
Merge pull request #115891 from bart0sh/PR103-CRI-add-CDI-devices
DRA: Pass CDI devices with a new CRI field
2023-02-28 14:53:28 -08:00
Kubernetes Prow Robot
18eea58ac2
Merge pull request #115359 from iancoolidge/devel-cpuset
More code-review changes from k/utlils cpuset review
2023-02-28 10:55:16 -08:00
Ed Bartosh
5a86895070 DRA: pass CDI devices through CRI CDIDevice field 2023-02-28 19:21:20 +02:00
SataQiu
ed2caf17e0 kubelet: remove unused DockerID type 2023-02-27 16:02:59 +08:00
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
f2bd94a0de In-place Pod Vertical Scaling - core implementation
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources

Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
2023-02-24 18:21:21 +00:00
Jan Safranek
bd73aee9db Add volume reconstruction metrics
Count nr. of volumes that kubelet tried to reconstruct + reconstruction
errors.
2023-02-22 13:01:26 +01:00
Ian K. Coolidge
d4a1bf83c1 cpuset: Convert Fatalf to Errrof in tests
Use of Fatalf is not apppropriate in any of these cases:
None of these failures are prerequisites.
2023-02-21 05:41:16 +00:00
Ian K. Coolidge
b536851fc7 cpuset: Add a few more test cases
Feedback from https://github.com/kubernetes/utils/pull/267 and related
reviews.

* Equality when insertion order is different
* UnsortedList contents
* Not-Subset cases
* Clone coverage
2023-02-21 05:40:54 +00:00
Ian K. Coolidge
22d3f67850 cpuset: Fix Parse() error message for n-k s.t. k<n
This case is tested extensively in cpuset_test.go, but the error message
needs a small adjustmnet.
2023-02-21 04:51:14 +00:00
huyinhou
32495ae3f1 add lock in generate topology hints function 2023-02-20 10:56:53 +08:00
Kubernetes Prow Robot
ffe410bbb4
Merge pull request #115604 from pacoxu/fix-design-proposals-links
old design proposals are now moved to Design Proposals Archive repo
2023-02-16 09:55:38 -08:00
Paco Xu
3d536bd14b API docs: point to current docs instead of archived designs 2023-02-16 15:32:08 +08:00
Kubernetes Prow Robot
e18fa74551
Merge pull request #115590 from swatisehgal/topology-mgr-duration-metrics
node: topology-mgr: Add metric to measure topology manager admission latency
2023-02-15 07:12:25 -08:00
Swati Sehgal
8442b450e5 node: topology-mgr: code optimization
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 14:04:10 +00:00
Swati Sehgal
bc941633c1 node: topology-mgr: add metric to measure topology mgr admission latency
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:47 +00:00
Kubernetes Prow Robot
8f55d34507
Merge pull request #115384 from sourcelliu/allowlist
Add test for pkg/kubelet/sysctl/allowlist_test.go
2023-02-14 12:45:51 -08:00
Kubernetes Prow Robot
5071c4f57e
Merge pull request #111982 from cvvz/kubelet-del-unnecessary-code
cleanup: delete useless code from kubelet volumemanager
2023-02-14 10:31:31 -08:00
cyclinder
1bdcd18bf6 close grpc server in test file to avoid goroutine leak
Signed-off-by: cyclinder <kuocyclinder@gmail.com>
2023-02-10 09:51:26 +08:00
Paco Xu
019d2615af archived design proposals are now moved to Design Proposals Archive Repo. 2023-02-08 11:12:22 +08:00
Kubernetes Prow Robot
5437d493da
Merge pull request #114364 from bart0sh/PR102-prepare-DRA-resources-before-CNI-setup
kubelet: prepare DRA resources before CNI setup
2023-02-07 08:09:04 -08:00
Kubernetes Prow Robot
22b88dea36
Merge pull request #115315 from enj/enj/i/kas_kubelet_conn_close
kubelet/client: collapse transport wiring onto standard approach
2023-02-07 07:01:14 -08:00
Madhav Jivrajani
5e1f440d0a *: Fix linter warnings
Adapt to newly improved linters in golangci-lint v1.51.1

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2023-02-07 13:01:41 +05:30