Commit Graph

21 Commits

Author SHA1 Message Date
adrianc
08b942028f
DRA: call plugins for claims even if exist in cache
Today, DRA manager does not call plugin NodePrepareResource
for claims that it previously successfully handled, that is,
if claims are present in cache (checkpoint) even if node
rebooted.

After node reboots, it is required to call DRA plugin
for resource claims so that plugins may prepare them
again in case the resources dont persist reboot.

To achieve that, once kubelet is started, we call DRA
plugins for claims once if a pod sandbox is required
to be created during PodSync.

Signed-off-by: adrianc <adrianc@nvidia.com>
2023-10-25 13:20:16 +03:00
Ed Bartosh
f6431c6138 DRA: don't query claims from API server
When a pod is force-deleted UnprepareResources fails to get a claim
from an API server.
PrepareResources should cache claim info required by the
UnprepareResources so that UnprepareResources would get it from
the cache instead of querying API server.
2023-07-18 18:23:10 +03:00
Kubernetes Prow Robot
047d040ce7
Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
7581ae8123
Merge pull request #116739 from moshe010/clone-cdi-devices
kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
2023-07-07 06:31:04 -07:00
Patrick Ohly
bde66bfb55 kubelet dra: restore skipping of unused resource claims
1aeec10efb removed iterating over containers in favor of iterating over pod
claims. This had the unintended consequence that NodePrepareResource gets
called unnecessarily when no container needs the claim. The more natural
behavior is to skip unused resources. This enables (theoretic, at this time)
use cases where some DRA driver relies on the controller part to influence
scheduling, but then doesn't use CDI with containers.
2023-06-27 16:02:31 +02:00
Patrick Ohly
874daa8b52 kubelet dra: fix checking of second pod which uses a claim
When a second pod wanted to use a claim, the obligatory sanity check whether
the pod is really allowed to use the claim ("reserved for") was skipped.
2023-06-27 16:01:11 +02:00
Moshe Levi
04ad946e8f kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
Currently claimInfo CDIDevices and annotations access directly without RLock.
This can lead to concurrent read write error.

To avoid it we added RLock all before getting the CDIDevices and annotations

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-05-01 15:09:43 +03:00
Ed Bartosh
1aeec10efb DRA: get rid of unneeded loops over pod containers 2023-03-15 09:41:30 +02:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Kevin Klues
74d634a028 Update kubelet support for recent changes to resource.k8s.io/v1alpha2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 22:34:18 +00:00
Moshe Levi
2a568bcfc8 kubelet podresources: extend List to support Dynamic Resources and implement Get API
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
Moshe Levi
9c57613912 Add ClassName to chekpoint state and in-memory cache
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
Kevin Klues
685688c703 Update DRAManager to allow multiple plugins to process a single claim
Right now, the v1alpha1 API only passes enough information for one plugin to
process a claim, but the v1alpha2 API will allow for multiple plugins to
process a claim. This commit prepares the code for this upcoming change.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Moshe Levi
e7256e08d3 kubelet dra: add checkpointing mechanism in the DRA Manager
The checkpointing mechanism will repopulate DRA Manager in-memory cache on kubelet restart.
This will ensure that the information needed by the PodResources API is available across
a kubelet restart.

The ClaimInfoState struct represent the DRA Manager in-memory cache state in checkpoint.
It is embedd in the ClaimInfo which also include the annotation field. The separation between
the in-memory cache and the cache state in the checkpoint is so we won't be tied to the in-memory
cache struct which may change in the future. In the ClaimInfoState we save the minimal required fields
to restore the in-memory cache.

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-10 12:22:15 +02:00
Ed Bartosh
5a86895070 DRA: pass CDI devices through CRI CDIDevice field 2023-02-28 19:21:20 +02:00
Ed Bartosh
4f88332ab4 kubelet: prepare DRA resources before CNI setup 2023-02-06 20:40:11 +02:00
Ed Bartosh
abcb56defb kubelet: do not enter termination status if pod might need to unprepare resources 2022-11-11 21:58:03 +01:00
Ed Bartosh
ae0f38437c kubelet: add support for dynamic resource allocation
Dependencies need to be updated to use
github.com/container-orchestrated-devices/container-device-interface.

It's not decided yet whether we will implement Topology support
for DRA or not. Not having any toppology-related code
will help to avoid wrong impression that DRA is used as a hint
provider for the Topology Manager.
2022-11-11 21:58:03 +01:00