Commit Graph

1255 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
006ad0576e
Merge pull request #116560 from bart0sh/PR107-DRA-get-rid-of-extra-loops
DRA: get rid of unneeded loops over pod containers
2023-04-11 21:16:50 -07:00
Kubernetes Prow Robot
ce56fd7c8b
Merge pull request #117152 from samuelkarp/godoc-typo
cpumanager: fix typo in godoc
2023-04-11 20:22:14 -07:00
Kubernetes Prow Robot
d0fc9d16ce
Merge pull request #114800 from haoruan/feature-8976-spew-sprintf-refactor
Capture spew.Sprintf() with all our favorite config into a util func
2023-04-11 15:34:57 -07:00
Samuel Karp
ea74a2d877
cpumanager: fix typo in godoc
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-04-06 16:48:24 -07:00
Hao Ruan
f638e2849f replaced spew.Sprintf with a util pretty print function 2023-03-27 09:24:22 +08:00
Ed Bartosh
1aeec10efb DRA: get rid of unneeded loops over pod containers 2023-03-15 09:41:30 +02:00
Kubernetes Prow Robot
74123a7341
Merge pull request #116621 from moshe010/dra-lock
kubelet dra: add lock to addCDIDevices
2023-03-14 19:27:28 -07:00
Kubernetes Prow Robot
815b1bf0d8
Merge pull request #116558 from klueska/update-dra-kubeletplugin-v1alpha2
Update kubeletplugin API for DRA to v1alpha2
2023-03-14 19:27:06 -07:00
Kubernetes Prow Robot
9ddf1a02bd
Merge pull request #116504 from vinaykul/restart-free-pod-vertical-scaling-kubeletonly-fix
Fix null pointer access in doPodResizeAction for kubeletonly mode
2023-03-14 19:26:59 -07:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Moshe Levi
ffb07d1e78 kubelet dra: add lock to addCDIDevices
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-15 00:50:45 +02:00
Kevin Klues
74d634a028 Update kubelet support for recent changes to resource.k8s.io/v1alpha2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 22:34:18 +00:00
Moshe Levi
2a568bcfc8 kubelet podresources: extend List to support Dynamic Resources and implement Get API
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
Moshe Levi
9c57613912 Add ClassName to chekpoint state and in-memory cache
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
kunkunhaohao
a772691165
Update pod_container_manager_linux.go (#114598)
* Update pod_container_manager_linux.go

This is a simple optimization to reduce repeated invoking of the GetPodContainerName function.

* Update pod_container_manager_linux.go

将podContainerName, _ := m.GetPodContainerName(pod)更靠近使用podcontainerName变量的位置
2023-03-14 09:38:36 -07:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
Kubernetes Prow Robot
e998b09bc4
Merge pull request #116555 from bart0sh/PR106-dra-plugin-constant
DRA: add constant PluginClientTimeout
2023-03-13 17:51:31 -07:00
Ed Bartosh
50cb3268b6 DRA: add constant PluginClientTimeout 2023-03-14 00:37:43 +02:00
Kevin Klues
685688c703 Update DRAManager to allow multiple plugins to process a single claim
Right now, the v1alpha1 API only passes enough information for one plugin to
process a claim, but the v1alpha2 API will allow for multiple plugins to
process a claim. This commit prepares the code for this upcoming change.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
569ed33d78 Add additional tests to DRAManager checkpointing
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
fd7370b84d Update DRAManager checkpoint to store a map for CDIDevices
The key of the map is the KubeletPluginName where the CDIDevices originate.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
273a8ffad1 Rename CdiDevices to CDIDevices in dramanager checkpoint
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Saza
d34b0275a3
dynamic resource allocation: add timeouts for communiction with plugin (#114844)
* add timeouts for communication with dra plugin

* move timeout constant to k8s.io/kubernetes/pkg/kubelet/cm/util

* move settings of timeout to pkg/kubelet/plugin/dra/plugin/client.go

* remove timeout constant
2023-03-13 04:34:56 -07:00
John Kwiatkoski
69465d2949
Adding test coverage for NewPodContainerManager() (#110220) 2023-03-13 02:08:44 -07:00
Kubernetes Prow Robot
3c6e419cc3
Merge pull request #116450 from vinaykul/restart-free-pod-vertical-scaling-api
Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources
2023-03-12 16:06:40 -07:00
Kubernetes Prow Robot
a4a0fd44d8
Merge pull request #115912 from moshe010/dra-checkpoint
kubelet DRA: Add checkpointing mechanism in the DRA Manager
2023-03-12 12:20:40 -07:00
Moshe Levi
2c79af0d63 kubelet dra: add unit tests for checkpoint
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-12 09:13:19 +02:00
vinay kulkarni
1c7850c355 Fix null pointer access in doPodResizeAction for kubeletonly mode 2023-03-12 05:59:14 +00:00
vinay kulkarni
01b96e7704 Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources 2023-03-10 14:49:26 +00:00
Moshe Levi
e7256e08d3 kubelet dra: add checkpointing mechanism in the DRA Manager
The checkpointing mechanism will repopulate DRA Manager in-memory cache on kubelet restart.
This will ensure that the information needed by the PodResources API is available across
a kubelet restart.

The ClaimInfoState struct represent the DRA Manager in-memory cache state in checkpoint.
It is embedd in the ClaimInfo which also include the annotation field. The separation between
the in-memory cache and the cache state in the checkpoint is so we won't be tied to the in-memory
cache struct which may change in the future. In the ClaimInfoState we save the minimal required fields
to restore the in-memory cache.

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-10 12:22:15 +02:00
Kubernetes Prow Robot
33d8614c9c
Merge pull request #115929 from HirazawaUi/delete-kubelet-unused-function
cleanup(kubelet): remove unused function
2023-03-09 22:43:12 -08:00
Kubernetes Prow Robot
06f0cba9b1
Merge pull request #115367 from tzneal/dedupe-resource-calculation
dedupe pod resource request calculation
2023-03-09 22:42:50 -08:00
Kubernetes Prow Robot
f6564d33ba
Merge pull request #114357 from dengyufeng2206/1208pull
Log spelling formatting
2023-03-09 21:33:22 -08:00
Todd Neal
4096c9209c dedupe pod resource request calculation 2023-03-09 17:15:53 -06:00
Kubernetes Prow Robot
8d5c96fed2
Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation
node: topologymgr: Graduate Kubelet Topology Manager to GA
2023-03-08 16:56:06 -08:00
David Porter
9c20cee504
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist" 2023-03-07 11:50:52 -08:00
Claudiu Belu
5ba74c81ca unit tests: Skip flaky tests on Windows
Some of the unit tests are currently flaky on Windows. This commit
skips them until they are resolved.
2023-03-06 20:46:05 +00:00
Kubernetes Prow Robot
890d39f976
Merge pull request #114640 from swatisehgal/handle-device-mgr-recovery
node: device-mgr: Handle recovery flow by checking if healthy devices exist
2023-03-06 07:10:28 -08:00
Kubernetes Prow Robot
68eea2468c
Merge pull request #114572 from huyinhou/fix-concurrent-map-access
kubelet/deviceplugin: fix concurrent map iteration and map write
2023-03-06 06:06:29 -08:00
Swati Sehgal
937d330393 node: topologymgr: Remove ResourceAllocator as TM is always enabled
With Topology Manager enabled by default, we no longer need
`resourceAllocator` as Topology Manager serves as the main
PodAdmitHandler completely responsible for admission check
based on hints received from the hintProviders and the
subsequent allocation of the corresponding resources to a
pod as can be seen here:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/topologymanager/scope.go#L150

With regard to DRA, the passing of `cm.draManager` into
resourceAllocator seems redundant as no admission checks
(and allocation of resources handled by DRA) is taking place
in `Admit` method of resourceAllocator. DRA has a completely
different model to the rest of the resource managers where
pod is only scheduled on a node once resources are reserved
for it. Because of this, admission checks or waiting for
resources to be provisioned after the pod has been scheduled
on the node is not required.

Before making the above change, it was verified that DRA Manager
is instantiated in `NewContainerManager`:
https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/kubelet/cm/container_manager_linux.go#L318

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:11 +00:00
Swati Sehgal
6a62f0236a node: topologymgr: trivial internal variable renaming
Since Topology manager is graduating to GA, we remove
internal configuration variable names with `Experimental`
prefix.

There is no expected change in behavior, only trival
variable renaming.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:11 +00:00
Swati Sehgal
d536a342b4 node: topologymgr: GA graduation implies Feature Gate is ON by default
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 12:51:05 +00:00
Swati Sehgal
5b2a3dbbdc node: device-mgr: explicitly check if pre-allocated devices are healthy
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
a799ffb571 node: device-mgr: unit-tests: admission failure due to unhealthy devices
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
7ac399c205 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
huyinhou
88274d96fc update code style
Signed-off-by: huyinhou <huyinhou@bytedance.com>
2023-03-06 14:23:14 +08:00
Sergey Kanzhelev
04189b1fc4 rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
Kubernetes Prow Robot
efe20f6c9b
Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537
node: cpumgr: stricter pre-check for  the policy option full-pcpus-only
2023-03-02 09:04:56 -08:00
Francesco Romani
0e9b92090c node: cpumgr: stricter precheck for full-pcpus-only
In order to implement the `full-pcpus-only` cpumanager policy option,
we leverage the implementation of the algorithm which picks CPUs.
By design, CPUs are taken from the biggest chunk available (socket
or NUMA zone) to physical cores, down to single cores.

Leveraging this, if the requested CPU count is a multiple of the SMT
level (commonly 2), we're guaranteed that only full physical cores
will be taken.

The hidden assumption here is this holds true by construction iff
the user reserved CPUs (if any) considering full physical CPUs.
IOW, if the user did intentionally or mistakely reserve single threads
which are no core siblings[1], then the simple check we implemented
is not sufficient.

A easy example can probably outline this better. With this setup:

cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings).
SMT level: 2 (each tuple is 2 elements)
Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`)

A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed.
The CPU allocator will take first full cores, (2,6) and (3,8), and will
then pick the remaining single CPUs. The allocation will succeed, but
it's incorrect.

We can fix this case with a stricter precheck.
We need to additionally consider all the core siblings of the reserved
CPUs as unavailable when computing the free cpus, before to start the
actual allocation. Doing so, we fall back in the intended behavior, and
by construction all possible CPUs allocation whose number is multiple
of the SMT level are now correct again.

+++

[1] or thread siblings in the linux parlance, in any case:
hyperthread siblings of the same physical core

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-02 16:00:58 +01:00
Kubernetes Prow Robot
6a25c528bb
Merge pull request #115891 from bart0sh/PR103-CRI-add-CDI-devices
DRA: Pass CDI devices with a new CRI field
2023-02-28 14:53:28 -08:00