Commit Graph

42 Commits

Author SHA1 Message Date
Patrick Ohly
7701a48bd6 dra kubelet: bump gRPC API to v1alpha4
The previous changes are an API break, therefore we need a new version.
2024-07-18 23:30:09 +02:00
Patrick Ohly
ee3205804b dra e2e: demonstrate how to use RBAC + VAP for a kubelet plugin
In reality, the kubelet plugin of a DRA driver is meant to be deployed as a
daemonset with a service account that limits its
permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens
ensures that the node name is bound to the pod, which then can be used
in a validating admission policy (VAP) to ensure that the operations are
limited to the node.

In E2E testing, we emulate that via impersonation. This ensures that the plugin
does not accidentally depend on additional permissions.
2024-07-18 23:30:09 +02:00
Patrick Ohly
348f94ab55 DRA: read ResourceClaim in DRA drivers
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
2024-07-18 09:09:20 +02:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00
Kubernetes Prow Robot
ac9aec9f9b Merge pull request #125116 from pohly/dra-one-of-source
DRA: remove "source" indirection from v1 Pod API
2024-06-28 12:46:45 -07:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Patrick Ohly
7f87629a3f DRA e2e: fix error reporting in test driver
Dropping the error that is returned by allocateOne hides the reason *why*
allocation failed. Including the UID is "too much information" for an error
message (usually the user doesn't care about the exact identity, just the name)
and the claim name can and will be added by the caller.

Before:

    controller.go:373: E0625 16:04:12.140953] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dramq9jv-resource-h72pg: failed allocating claim 8551afba-3c9a-4a8a-8633-6fad6c4b9e42" key="schedulingCtx:test/test-dramq9jv"
    event.go:377: I0625 16:04:12.141031] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dra65gfw", UID:"6be9ba57-31da-4fef-b61d-b0468d71afcf", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"197", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dra65gfw-resource-zpzrj: failed allocating claim f98a32e1-ab7d-4b34-a258-6d8224aa9006

After:

    controller.go:373: E0625 16:02:54.248059] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dram98ll-resource-nvsbj: device selectors are not supported" key="schedulingCtx:test/test-dram98ll"
    event.go:377: I0625 16:02:54.248163] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dratpt77", UID:"24010402-b026-4fe4-a535-e1dab69db8c0", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"298", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dratpt77-resource-vlgrv: device selectors are not supported
2024-06-25 16:04:56 +02:00
Ed Bartosh
c8c7ae85e5 e2e_node: DRA: add CountCalls API 2024-06-07 22:47:23 +03:00
Ed Bartosh
ffc407b4dd e2e_node: DRA: reimplement call blocking 2024-06-07 22:47:20 +03:00
Ed Bartosh
2ea2fb3166 e2e: test-driver: implement failure mode 2024-06-07 22:45:35 +03:00
Ed Bartosh
f609aa8310 e2e: test-driver: add new matchers 2024-05-25 01:02:25 +03:00
Patrick Ohly
77341f7595 DRA: remove support for v1alpha2 kubelet API
The v1alpha2 API is several releases old. No current drivers should still
depend on it.
2024-04-19 18:27:05 +02:00
Patrick Ohly
cf8fffae72 dra e2e: sanity check resource handle
When using structured parameters, the instance name must match and not be in
use already.

NodeUnprepareResources must be called with the same handle are
NodePrepareResources.
2024-03-14 20:42:31 +01:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Patrick Ohly
d59676a545 dra kubelet: publish NodeResourceSlices
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.

However, DRA drivers need to be updated because the Go API changed. They can
return
    status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.

The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
5e40afca06 dra testing: add tests for structured parameters
The test driver now supports a ConfigMap (as before) and the named resources
structured parameter model. It doesn't have any instance attributes.
2024-03-07 22:22:13 +01:00
Patrick Ohly
6f1ddfcd2e kubelet: support structured parameters for preparing resources
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.

This is a new, backwards-compatible field.

That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
2024-03-07 22:22:13 +01:00
Patrick Ohly
36146ad686 e2e dra: enhance test driver
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
  `leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
  command line flag overrides the config, but only when set explicitly.
  This makes it possible to pre-define complete driver setups where the
  name is associated with certain resource availability. This will be
  used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
  via node labels. This will be used for testing cluster autoscaling.
2023-09-25 19:50:33 +02:00
Patrick Ohly
0e23840929 dra test: enhance performance of test driver controller
Analyzing the CPU profile of

    go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf

showed that a significant amount of time was spent iterating over allocated
claims to determine how many were allocated per node. That "naive" approach was
taken to avoid maintaining a redundant data structure, but now that performance
measurements show that this comes at a cost, it's not "premature optimization"
anymore to introduce such a second field.

The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4
pods/s to 19.2 pods/s.
2023-08-08 13:36:35 +02:00
Kubernetes Prow Robot
bea27f82d3 Merge pull request #118209 from pohly/dra-pre-scheduled-pods
dra: pre-scheduled pods
2023-07-13 14:43:37 -07:00
Patrick Ohly
08d40f53a7 dra: test with and without immediate ReservedFor
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
2023-07-12 16:57:17 +02:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Alexey Fomenko
b10cc642b5 DRA controller: batch resource claims for Allocate
Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2023-07-06 19:31:45 +03:00
Ed Bartosh
5c5f6e8fe2 DRA Node E2E: add NodePrepareResourceCalled API 2023-06-13 12:42:05 +03:00
Ed Bartosh
673d0aaa60 DRA Node E2E: add call blocking to the Kubelet plugin APIs 2023-06-13 12:41:59 +03:00
Patrick Ohly
d0a64739e2 e2e dra: collect and check GRPC calls
If kubelet plugin registration fails, it would be good to know more about the
communication with kubelet. Capturing the GRPC calls and then checking that
makes the failure messages more informative. Here's an example where a failure
was triggered by temporarily modifying the check so that it didn't find the
call:

  [FAILED] Timed out after 30.000s.
  Expected:
      <[]app.GRPCCall | len:2, cap:2>: [
          {
              FullMethod: "/pluginregistration.Registration/GetInfo",
              Request:
                  {},
              Response:
                  endpoint: /var/lib/kubelet/plugins/test-driver/dra.sock
                  name: test-driver.cdi.k8s.io
                  supported_versions:
                  - 1.0.0
                  type: DRAPlugin,
              Err: nil,
          },
          {
              FullMethod: "/pluginregistration.Registration/NotifyRegistrationStatus",
              Request:
                  plugin_registered: true,
              Response:
                  {},
              Err: nil,
          },
      ]
  to contain successful NotifyRegistrationStatus call
2023-06-01 09:58:05 +02:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Kevin Klues
6ba9b91604 Update e2e tests for recent changes to resource.k8s.io/v1alpha2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 22:34:19 +00:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Ed Bartosh
35fd124f4d DRA: fix CDI spec version
The latest CDI release includes spec version check that fails
if version is less than 0.3.0:
  https://github.com/container-orchestrated-devices/container-device-interface/blob/v0.5.4/pkg/cdi/version.go#L42

Updating CDI spec version to 0.3.0 in the test kubelet plugin code
should fix e2e test failures on the CRI runtimes that use CDI >= 0.5.4
(Containerd master atm, CRI-O soon).
2023-03-05 16:49:56 +02:00
Patrick Ohly
74785074c6 e2e dra: update logging
When running as part of the scheduler_perf benchmark testing, we want to print
less information by default, so we should use V to limit verbosity

Pretty-printing doesn't belong into "application" code. I am moving that into
the ktesting formatting (https://github.com/kubernetes/kubernetes/pull/116180).
2023-03-01 15:02:03 +01:00
Patrick Ohly
106fce6fae e2e dra: improve goroutine handling
There is an API now to wait for informer factory goroutine termination.
While at it, an incorrect comment for mutex locking gets removed.
2023-03-01 15:00:30 +01:00
Patrick Ohly
20d7fa2771 e2e dra: fix resource limits in a mixed cluster
The check for "resources available on a node" must treat nodes that are not
listed as "no resources available". The previous logic only worked because all
nodes were listed during E2E testing. The upcoming integration testing is
covering additional scenarios and triggered this broken case.
2023-02-15 15:12:19 +01:00
vaibhav2107
6ab8a8fbec Updated the change in registry 2023-02-09 09:37:44 +05:30
Patrick Ohly
136f89dfc5 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-02-06 15:39:13 +01:00
Antonio Ojea
7f5ae1c0c1 Revert "e2e: wait for pods with gomega" 2023-02-06 12:08:22 +01:00
Patrick Ohly
222f655062 e2e: use error wrapping with %w
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).

Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with

    sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)

This may be unnecessary in some cases, but it's not wrong.
2023-01-31 13:01:39 +01:00
Kubernetes Prow Robot
57eb5d631c Merge pull request #113976 from swatisehgal/dra-fix-claim-parameter-name
dra: test examples: ensure that the claim parameter name is consistent
2022-11-18 05:16:30 -08:00
Swati Sehgal
4d15502e43 dra: test examples: ensure that the claim parameter name is consistent
In the Dynamic Resource allocation example specs, the claim
parameter name specified was inconsistent.

This commit fixes that with a better/more consistent name,
which is used to define the configmap and referenced in
the `ResourceClaimTemplate` spec.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-17 14:56:42 +00:00
Moshe Levi
f46b66088a [DRA] Add RUNTIME_CONFIG="resource.k8s.io/v1alpha1"
This flag is required to enable the DRA resource api

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2022-11-16 23:09:24 +02:00
Patrick Ohly
14db9d1f92 e2e dra: add test driver and tests for dynamic resource allocation
The driver can be used manually against a cluster started with
local-up-cluster.sh and is also used for E2E testing. Because the tests proxy
connections from the nodes into the e2e.test binary and create/delete files via
the equivalent of "kubectl exec dd/rm", they can be run against arbitrary
clusters. Each test gets its own driver instance and resource class, therefore
they can run in parallel.
2022-11-12 00:17:15 +01:00