Commit Graph

63 Commits

Author SHA1 Message Date
Alvaro Aleman
6d0ac8c561 Use the generic/typed workqueue throughout
This change makes us use the generic workqueue throughout the project in
order to improve type safety and readability of the code.
2024-05-04 14:33:12 -04:00
Patrick Ohly
77341f7595 DRA: remove support for v1alpha2 kubelet API
The v1alpha2 API is several releases old. No current drivers should still
depend on it.
2024-04-19 18:27:05 +02:00
Ayato Tokubi
d04f87abde add nil check for Node(Un)PrepareResources.
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2024-04-04 23:24:25 +00:00
HirazawaUi
10b6319e64 fix slow dra unit test 2024-03-16 22:21:15 +08:00
Ed Bartosh
26881132bd kubelet: assign Node as an owner for the ResourceSlice
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2024-03-15 09:46:13 +02:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Kevin Klues
fc2134c84c dra kubelet: fix error log
Previously we were returning the error string from 'err' (which is nil), when
we should have been returning it from result.Error. Without this it is hard to
debug issues with NodeUnprepareResources.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-11 13:51:29 +00:00
Kevin Klues
13a6dcc21c dra kubelet: add StructuredResourceModel to UnprepareResources call
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-09 18:08:14 +00:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
d59676a545 dra kubelet: publish NodeResourceSlices
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.

However, DRA drivers need to be updated because the Go API changed. They can
return
    status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.

The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
6f1ddfcd2e kubelet: support structured parameters for preparing resources
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.

This is a new, backwards-compatible field.

That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
2024-03-07 22:22:13 +01:00
TommyStarK
6f021e99cf dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake.
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2024-01-11 09:20:04 +01:00
charles-chenzz
abaf7a800d increase timeout in fakeDraDriverGrpcServer to fix flake in dra/manger_test 2023-11-07 19:38:27 +08:00
Kubernetes Prow Robot
191abe34b8 Merge pull request #120550 from adrianchiris/fix-dra-node-reboot
DRA: call plugins for claims even if exist in cache
2023-10-26 10:26:59 +02:00
adrianc
3738111337 Add unit tests
adjust existing tests and add new test flows
to cover new DRA manager behaviour

Signed-off-by: adrianc <adrianc@nvidia.com>
2023-10-25 13:20:22 +03:00
adrianc
08b942028f DRA: call plugins for claims even if exist in cache
Today, DRA manager does not call plugin NodePrepareResource
for claims that it previously successfully handled, that is,
if claims are present in cache (checkpoint) even if node
rebooted.

After node reboots, it is required to call DRA plugin
for resource claims so that plugins may prepare them
again in case the resources dont persist reboot.

To achieve that, once kubelet is started, we call DRA
plugins for claims once if a pod sandbox is required
to be created during PodSync.

Signed-off-by: adrianc <adrianc@nvidia.com>
2023-10-25 13:20:16 +03:00
TommyStarK
55e3662b72 dra: refactoring overall flow of prepare/unprepare resources
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-10-23 15:11:27 +02:00
Kubernetes Prow Robot
f9f00da6bc Merge pull request #118761 from TommyStarK/gh_113831
move common logic of highestSupportedVersion to util package
2023-09-18 13:59:25 -07:00
TommyStarK
42356bfbb3 move common logic of highestSupportedVersion to util package
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-09-18 21:25:29 +02:00
Kubernetes Prow Robot
82bca6304b Merge pull request #119464 from TommyStarK/dra/cleanup-manager-unit-tests
dra: cleanup manager unit tests
2023-09-18 07:08:43 -07:00
Kubernetes Prow Robot
19deb04a90 Merge pull request #118619 from TommyStarK/gh_113832
dynamic resource allocation: reuse gRPC connection
2023-08-16 09:32:27 -07:00
charles-chenzz
ba9ce3ab08 fix flaky test on dra TestPrepareResources/should_timeout
Co-authored-by: TommyStarK <thomasmilox@gmail.com>
2023-08-03 22:37:54 +08:00
TommyStarK
391c1a3ecc dra: cleanup manager unit tests
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-08-02 23:35:45 +02:00
TommyStarK
60a8bca507 dynamic resource allocation: add unit test to check the reuse of the gRPC connection
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-20 19:22:25 +02:00
TommyStarK
7ffd3063ce dynamic resource allocation: reuse gRPC connection
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-19 10:12:52 +02:00
Kevin Klues
0449cef8fd Increase timeout for DRA kubelet plugin client
The 10 second timeout was too low. Given that the retry loop for the
kubelet itself is 90s, increasing the timeout to half of this seems
reasonable. Ideally we would pull in the variable that sets the retry
timeout to 90s and then just set our local timeout to half of that.
Unfortunately, this is not exported, so we settle (for now with just
explicitly setting it to 45s.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-07-18 22:45:01 +01:00
Ed Bartosh
0ec99fb0b2 Kubelet DRA: fix failing test cases 2023-07-18 19:06:33 +03:00
Ed Bartosh
f6431c6138 DRA: don't query claims from API server
When a pod is force-deleted UnprepareResources fails to get a claim
from an API server.
PrepareResources should cache claim info required by the
UnprepareResources so that UnprepareResources would get it from
the cache instead of querying API server.
2023-07-18 18:23:10 +03:00
Kubernetes Prow Robot
6d83e22ba4 Merge pull request #118711 from TommyStarK/tom/gh_118436
add unit test for dra/manager.go
2023-07-18 04:17:09 -07:00
charles-chenzz
0372e4b662 add unit test for dra/manager.go.
Co-Authored-By: charles-chenzz <Rekles666@gmail.com>
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-18 12:14:27 +02:00
Kubernetes Prow Robot
bdcf812c95 Merge pull request #118254 from elezar/4009/add-cdi-devices-to-device-plugin
Add CDI devices to device plugin API
2023-07-17 05:21:08 -07:00
Kubernetes Prow Robot
047d040ce7 Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Kubernetes Prow Robot
be222f38f0 Merge pull request #119058 from TommyStarK/dra-state-checkpoint-unit-test
dynamic resource allocation: Improve code coverage of state checkpoint
2023-07-12 07:49:14 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
TommyStarK
f924bf95df dynamic resource allocation: Improve code coverage of state checkpoint
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-12 13:27:18 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Evan Lezar
f0e3c32fe5 Move CDI annotation code to utils package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:47:53 +02:00
Kubernetes Prow Robot
7581ae8123 Merge pull request #116739 from moshe010/clone-cdi-devices
kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
2023-07-07 06:31:04 -07:00
Patrick Ohly
bde66bfb55 kubelet dra: restore skipping of unused resource claims
1aeec10efb removed iterating over containers in favor of iterating over pod
claims. This had the unintended consequence that NodePrepareResource gets
called unnecessarily when no container needs the claim. The more natural
behavior is to skip unused resources. This enables (theoretic, at this time)
use cases where some DRA driver relies on the controller part to influence
scheduling, but then doesn't use CDI with containers.
2023-06-27 16:02:31 +02:00
Patrick Ohly
874daa8b52 kubelet dra: fix checking of second pod which uses a claim
When a second pod wanted to use a claim, the obligatory sanity check whether
the pod is really allowed to use the claim ("reserved for") was skipped.
2023-06-27 16:01:11 +02:00
Moshe Levi
04ad946e8f kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
Currently claimInfo CDIDevices and annotations access directly without RLock.
This can lead to concurrent read write error.

To avoid it we added RLock all before getting the CDIDevices and annotations

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-05-01 15:09:43 +03:00
Ed Bartosh
1aeec10efb DRA: get rid of unneeded loops over pod containers 2023-03-15 09:41:30 +02:00
Kubernetes Prow Robot
74123a7341 Merge pull request #116621 from moshe010/dra-lock
kubelet dra: add lock to addCDIDevices
2023-03-14 19:27:28 -07:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Moshe Levi
ffb07d1e78 kubelet dra: add lock to addCDIDevices
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-15 00:50:45 +02:00
Kevin Klues
74d634a028 Update kubelet support for recent changes to resource.k8s.io/v1alpha2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 22:34:18 +00:00
Moshe Levi
2a568bcfc8 kubelet podresources: extend List to support Dynamic Resources and implement Get API
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
Moshe Levi
9c57613912 Add ClassName to chekpoint state and in-memory cache
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-14 19:33:04 +02:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
Kubernetes Prow Robot
e998b09bc4 Merge pull request #116555 from bart0sh/PR106-dra-plugin-constant
DRA: add constant PluginClientTimeout
2023-03-13 17:51:31 -07:00