kubernetes

Author	SHA1	Message	Date
adrianc	08b942028f	DRA: call plugins for claims even if exist in cache Today, DRA manager does not call plugin NodePrepareResource for claims that it previously successfully handled, that is, if claims are present in cache (checkpoint) even if node rebooted. After node reboots, it is required to call DRA plugin for resource claims so that plugins may prepare them again in case the resources dont persist reboot. To achieve that, once kubelet is started, we call DRA plugins for claims once if a pod sandbox is required to be created during PodSync. Signed-off-by: adrianc <adrianc@nvidia.com>	2023-10-25 13:20:16 +03:00
Ed Bartosh	f6431c6138	DRA: don't query claims from API server When a pod is force-deleted UnprepareResources fails to get a claim from an API server. PrepareResources should cache claim info required by the UnprepareResources so that UnprepareResources would get it from the cache instead of querying API server.	2023-07-18 18:23:10 +03:00
Evan Lezar	f0e3c32fe5	Move CDI annotation code to utils package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:47:53 +02:00
Moshe Levi	ffb07d1e78	kubelet dra: add lock to addCDIDevices Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-15 00:50:45 +02:00
Moshe Levi	2a568bcfc8	kubelet podresources: extend List to support Dynamic Resources and implement Get API Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-14 19:33:04 +02:00
Moshe Levi	9c57613912	Add ClassName to chekpoint state and in-memory cache Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-14 19:33:04 +02:00
Kevin Klues	685688c703	Update DRAManager to allow multiple plugins to process a single claim Right now, the v1alpha1 API only passes enough information for one plugin to process a claim, but the v1alpha2 API will allow for multiple plugins to process a claim. This commit prepares the code for this upcoming change. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-03-13 12:52:41 +00:00
Moshe Levi	e7256e08d3	kubelet dra: add checkpointing mechanism in the DRA Manager The checkpointing mechanism will repopulate DRA Manager in-memory cache on kubelet restart. This will ensure that the information needed by the PodResources API is available across a kubelet restart. The ClaimInfoState struct represent the DRA Manager in-memory cache state in checkpoint. It is embedd in the ClaimInfo which also include the annotation field. The separation between the in-memory cache and the cache state in the checkpoint is so we won't be tied to the in-memory cache struct which may change in the future. In the ClaimInfoState we save the minimal required fields to restore the in-memory cache. Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-10 12:22:15 +02:00
Ed Bartosh	abcb56defb	kubelet: do not enter termination status if pod might need to unprepare resources	2022-11-11 21:58:03 +01:00
Ed Bartosh	ae0f38437c	kubelet: add support for dynamic resource allocation Dependencies need to be updated to use github.com/container-orchestrated-devices/container-device-interface. It's not decided yet whether we will implement Topology support for DRA or not. Not having any toppology-related code will help to avoid wrong impression that DRA is used as a hint provider for the Topology Manager.	2022-11-11 21:58:03 +01:00

10 Commits