Commit Graph

14 Commits

Author SHA1 Message Date
Patrick Ohly
77341f7595 DRA: remove support for v1alpha2 kubelet API
The v1alpha2 API is several releases old. No current drivers should still
depend on it.
2024-04-19 18:27:05 +02:00
HirazawaUi
10b6319e64 fix slow dra unit test 2024-03-16 22:21:15 +08:00
Patrick Ohly
d59676a545 dra kubelet: publish NodeResourceSlices
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.

However, DRA drivers need to be updated because the Go API changed. They can
return
    status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.

The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
6f1ddfcd2e kubelet: support structured parameters for preparing resources
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.

This is a new, backwards-compatible field.

That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
2024-03-07 22:22:13 +01:00
TommyStarK
55e3662b72 dra: refactoring overall flow of prepare/unprepare resources
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-10-23 15:11:27 +02:00
TommyStarK
60a8bca507 dynamic resource allocation: add unit test to check the reuse of the gRPC connection
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-20 19:22:25 +02:00
TommyStarK
7ffd3063ce dynamic resource allocation: reuse gRPC connection
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-19 10:12:52 +02:00
Kevin Klues
0449cef8fd Increase timeout for DRA kubelet plugin client
The 10 second timeout was too low. Given that the retry loop for the
kubelet itself is 90s, increasing the timeout to half of this seems
reasonable. Ideally we would pull in the variable that sets the retry
timeout to 90s and then just set our local timeout to half of that.
Unfortunately, this is not exported, so we settle (for now with just
explicitly setting it to 45s.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-07-18 22:45:01 +01:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Ed Bartosh
50cb3268b6 DRA: add constant PluginClientTimeout 2023-03-14 00:37:43 +02:00
Saza
d34b0275a3 dynamic resource allocation: add timeouts for communiction with plugin (#114844)
* add timeouts for communication with dra plugin

* move timeout constant to k8s.io/kubernetes/pkg/kubelet/cm/util

* move settings of timeout to pkg/kubelet/plugin/dra/plugin/client.go

* remove timeout constant
2023-03-13 04:34:56 -07:00
Patrick Ohly
bc6c7fa912 logging: fix names of keys
The stricter checking with the upcoming logcheck v0.4.1 pointed out these names
which don't comply with our recommendations in
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.
2023-01-23 14:24:29 +01:00
Ed Bartosh
ae0f38437c kubelet: add support for dynamic resource allocation
Dependencies need to be updated to use
github.com/container-orchestrated-devices/container-device-interface.

It's not decided yet whether we will implement Topology support
for DRA or not. Not having any toppology-related code
will help to avoid wrong impression that DRA is used as a hint
provider for the Topology Manager.
2022-11-11 21:58:03 +01:00