Commit Graph

10470 Commits

Author SHA1 Message Date
Swati Sehgal
937d330393 node: topologymgr: Remove ResourceAllocator as TM is always enabled
With Topology Manager enabled by default, we no longer need
`resourceAllocator` as Topology Manager serves as the main
PodAdmitHandler completely responsible for admission check
based on hints received from the hintProviders and the
subsequent allocation of the corresponding resources to a
pod as can be seen here:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/topologymanager/scope.go#L150

With regard to DRA, the passing of `cm.draManager` into
resourceAllocator seems redundant as no admission checks
(and allocation of resources handled by DRA) is taking place
in `Admit` method of resourceAllocator. DRA has a completely
different model to the rest of the resource managers where
pod is only scheduled on a node once resources are reserved
for it. Because of this, admission checks or waiting for
resources to be provisioned after the pod has been scheduled
on the node is not required.

Before making the above change, it was verified that DRA Manager
is instantiated in `NewContainerManager`:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/container_manager_linux.go#L318

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:11 +00:00
Swati Sehgal
6a62f0236a node: topologymgr: trivial internal variable renaming
Since Topology manager is graduating to GA, we remove
internal configuration variable names with `Experimental`
prefix.

There is no expected change in behavior, only trival
variable renaming.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:11 +00:00
Swati Sehgal
d536a342b4 node: topologymgr: GA graduation implies Feature Gate is ON by default
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:05 +00:00
Kubernetes Prow Robot
b8aaaf380a
Merge pull request #116083 from SataQiu/clean-20230227
kubelet: remove unused DockerID type
2023-03-06 02:22:58 -08:00
Sergey Kanzhelev
04189b1fc4 rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
Sergey Kanzhelev
e360de48b2 GRPCContainerProbe is GA 2023-03-02 22:07:59 +00:00
Kubernetes Prow Robot
57fd02ca29
Merge pull request #116218 from pohly/test-lease-controller-leak
update lease controller
2023-03-02 10:30:56 -08:00
Kubernetes Prow Robot
efe20f6c9b
Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537
node: cpumgr: stricter pre-check for  the policy option full-pcpus-only
2023-03-02 09:04:56 -08:00
Francesco Romani
0e9b92090c node: cpumgr: stricter precheck for full-pcpus-only
In order to implement the `full-pcpus-only` cpumanager policy option,
we leverage the implementation of the algorithm which picks CPUs.
By design, CPUs are taken from the biggest chunk available (socket
or NUMA zone) to physical cores, down to single cores.

Leveraging this, if the requested CPU count is a multiple of the SMT
level (commonly 2), we're guaranteed that only full physical cores
will be taken.

The hidden assumption here is this holds true by construction iff
the user reserved CPUs (if any) considering full physical CPUs.
IOW, if the user did intentionally or mistakely reserve single threads
which are no core siblings[1], then the simple check we implemented
is not sufficient.

A easy example can probably outline this better. With this setup:

cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings).
SMT level: 2 (each tuple is 2 elements)
Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`)

A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed.
The CPU allocator will take first full cores, (2,6) and (3,8), and will
then pick the remaining single CPUs. The allocation will succeed, but
it's incorrect.

We can fix this case with a stricter precheck.
We need to additionally consider all the core siblings of the reserved
CPUs as unavailable when computing the free cpus, before to start the
actual allocation. Doing so, we fall back in the intended behavior, and
by construction all possible CPUs allocation whose number is multiple
of the SMT level are now correct again.

+++

[1] or thread siblings in the linux parlance, in any case:
hyperthread siblings of the same physical core

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-02 16:00:58 +01:00
Patrick Ohly
dad95e1be6 update lease controller
Passing in a context instead of a stop channel has several advantages:
- ensures that client-go calls return as soon as the controller is asked to stop
- contextual logging can be used

By passing that context down to its own functions and checking it while
waiting, the lease controller also doesn't get stuck in backoffEnsureLease
anymore (https://github.com/kubernetes/kubernetes/issues/116196).
2023-03-02 15:06:00 +01:00
ruiwen-zhao
572e6e0ffb Add MaxParallelImagePulls support
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-03-02 03:57:59 +00:00
Kubernetes Prow Robot
53f3583c7f
Merge pull request #114785 from TommyStarK/kubelet/replace-deprecated-pointer-function
kubelet: Replace deprecated pointer function
2023-03-01 18:04:55 -08:00
Patrick Ohly
961819a4d0 dependencies: update klog v2.90.1
This improves performance of the text formatting and ktesting.

Because ktesting no longer buffers messages by default, one unit
test needs to ask for that explicitly.
2023-03-01 19:03:50 +01:00
Kubernetes Prow Robot
6a25c528bb
Merge pull request #115891 from bart0sh/PR103-CRI-add-CDI-devices
DRA: Pass CDI devices with a new CRI field
2023-02-28 14:53:28 -08:00
Kubernetes Prow Robot
18eea58ac2
Merge pull request #115359 from iancoolidge/devel-cpuset
More code-review changes from k/utlils cpuset review
2023-02-28 10:55:16 -08:00
Ed Bartosh
5a86895070 DRA: pass CDI devices through CRI CDIDevice field 2023-02-28 19:21:20 +02:00
SataQiu
ed2caf17e0 kubelet: remove unused DockerID type 2023-02-27 16:02:59 +08:00
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
f2bd94a0de In-place Pod Vertical Scaling - core implementation
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources

Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
2023-02-24 18:21:21 +00:00
Ian K. Coolidge
d4a1bf83c1 cpuset: Convert Fatalf to Errrof in tests
Use of Fatalf is not apppropriate in any of these cases:
None of these failures are prerequisites.
2023-02-21 05:41:16 +00:00
Ian K. Coolidge
b536851fc7 cpuset: Add a few more test cases
Feedback from https://github.com/kubernetes/utils/pull/267 and related
reviews.

* Equality when insertion order is different
* UnsortedList contents
* Not-Subset cases
* Clone coverage
2023-02-21 05:40:54 +00:00
Ian K. Coolidge
22d3f67850 cpuset: Fix Parse() error message for n-k s.t. k<n
This case is tested extensively in cpuset_test.go, but the error message
needs a small adjustmnet.
2023-02-21 04:51:14 +00:00
Kubernetes Prow Robot
ffe410bbb4
Merge pull request #115604 from pacoxu/fix-design-proposals-links
old design proposals are now moved to Design Proposals Archive repo
2023-02-16 09:55:38 -08:00
Paco Xu
3d536bd14b API docs: point to current docs instead of archived designs 2023-02-16 15:32:08 +08:00
Kubernetes Prow Robot
e18fa74551
Merge pull request #115590 from swatisehgal/topology-mgr-duration-metrics
node: topology-mgr: Add metric to measure topology manager admission latency
2023-02-15 07:12:25 -08:00
Swati Sehgal
8442b450e5 node: topology-mgr: code optimization
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 14:04:10 +00:00
Swati Sehgal
bc941633c1 node: topology-mgr: add metric to measure topology mgr admission latency
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:47 +00:00
Kubernetes Prow Robot
8f55d34507
Merge pull request #115384 from sourcelliu/allowlist
Add test for pkg/kubelet/sysctl/allowlist_test.go
2023-02-14 12:45:51 -08:00
Kubernetes Prow Robot
5071c4f57e
Merge pull request #111982 from cvvz/kubelet-del-unnecessary-code
cleanup: delete useless code from kubelet volumemanager
2023-02-14 10:31:31 -08:00
cyclinder
1bdcd18bf6 close grpc server in test file to avoid goroutine leak
Signed-off-by: cyclinder <kuocyclinder@gmail.com>
2023-02-10 09:51:26 +08:00
Paco Xu
019d2615af archived design proposals are now moved to Design Proposals Archive Repo. 2023-02-08 11:12:22 +08:00
Kubernetes Prow Robot
5437d493da
Merge pull request #114364 from bart0sh/PR102-prepare-DRA-resources-before-CNI-setup
kubelet: prepare DRA resources before CNI setup
2023-02-07 08:09:04 -08:00
Kubernetes Prow Robot
22b88dea36
Merge pull request #115315 from enj/enj/i/kas_kubelet_conn_close
kubelet/client: collapse transport wiring onto standard approach
2023-02-07 07:01:14 -08:00
Madhav Jivrajani
5e1f440d0a *: Fix linter warnings
Adapt to newly improved linters in golangci-lint v1.51.1

Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
2023-02-07 13:01:41 +05:30
Monis Khan
754cb3d601
kubelet/client: collapse transport wiring onto standard approach
Signed-off-by: Monis Khan <mok@microsoft.com>
2023-02-06 20:34:49 -05:00
Ed Bartosh
4f88332ab4 kubelet: prepare DRA resources before CNI setup 2023-02-06 20:40:11 +02:00
Kubernetes Prow Robot
d3a62dcb76
Merge pull request #114351 from ruiwen-zhao/event_ignore_nil
[Evented PLEG] Ignore container events with nil PodSandboxStatus
2023-02-02 12:52:42 -08:00
Kubernetes Prow Robot
b1667918bc
Merge pull request #115424 from songxiao-wang87/runwxs-test11
Make docs more accurate for the contention-profiling flag
2023-02-01 07:25:20 -08:00
Kubernetes Prow Robot
51c54a1e2f
Merge pull request #114179 from lixiaobing1/break
improve performance
2023-01-31 21:01:06 -08:00
mantuliu
3f8ada67c5 impove the coverage
Signed-off-by: mantuliu <240951888@qq.com>
2023-02-01 10:47:38 +08:00
ruiwen-zhao
fabcc91956 Ignore container events with nil PodSandboxStatus
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-01-31 22:50:51 +00:00
mantuliu
52e7bf58cf cut avoid unnecessary code duplications
Signed-off-by: mantuliu <240951888@qq.com>
2023-01-31 23:55:09 +08:00
Claudiu Belu
ec753fcb55 unittests: Fixes unit tests for Windows (part 6)
Currently, there are some unit tests that are failing on Windows due to
various reasons:

- On Windows, consecutive time.Now() calls may return the same timestamp, which would cause
  the TestFreeSpaceRemoveByLeastRecentlyUsed test to flake.
- tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs,
  expecting the CPU max set to be more than 100% of available CPUs, which is not possible.
- calls in summary_windows_test.go are missing context.
- filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container
  information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows,
  thus we can't make the same assertions.
2023-01-31 11:49:26 +00:00
songxiao-wang87
3e6b954290 Making a run test.
Signed-off-by: songxiao-wang87 <wang.xiaosong23@zte.com.cn>
2023-01-31 09:38:48 +00:00
Kubernetes Prow Robot
4df945853e
Merge pull request #115137 from swatisehgal/topologymgr-metrics
node: topologymgr: add metrics about admission requests and errors
2023-01-30 18:43:00 -08:00
Kubernetes Prow Robot
559014f13e
Merge pull request #115273 from SergeyKanzhelev/restartCountRegexFix
use a proper regex looking for the restartCount
2023-01-30 17:36:49 -08:00
Kubernetes Prow Robot
d0584179f4
Merge pull request #114367 from liggitt/kubelet-csr-init
Check for initial kubelet certificates more frequently
2023-01-30 09:07:05 -08:00
Kubernetes Prow Robot
232c0de57a
Merge pull request #115101 from HirazawaUi/delte-pkg-kubelet-unused-functions
delete unused functions in pkg/kubelet directory
2023-01-29 17:21:08 -08:00
mantuliu
8ca97dcde1 Add test for pkg/kubelet/sysctl/allowlist_test.go 2023-01-29 22:48:27 +08:00
Kubernetes Prow Robot
538c6c044f
Merge pull request #115329 from aojea/disable_probe
skip scale test for probes
2023-01-25 22:02:33 -08:00