kubernetes

Author	SHA1	Message	Date
Patrick Ohly	ee3205804b	dra e2e: demonstrate how to use RBAC + VAP for a kubelet plugin In reality, the kubelet plugin of a DRA driver is meant to be deployed as a daemonset with a service account that limits its permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens ensures that the node name is bound to the pod, which then can be used in a validating admission policy (VAP) to ensure that the operations are limited to the node. In E2E testing, we emulate that via impersonation. This ensures that the plugin does not accidentally depend on additional permissions.	2024-07-18 23:30:09 +02:00
Patrick Ohly	616a014347	DRA: move ResourceSlice publishing into DRA drivers This is a first step towards making kubelet independent of the resource.k8s.io API versioning because it now doesn't need to copy structs defined by that API from the driver to the API server. The next step is removing the other direction (reading ResourceClaim status and passing the resource handle to drivers). The drivers must get deployed so that they have their own connection to the API server. Securing at least the writes via a validating admission policy should be possible. As before, the kubelet removes all ResourceSlices for its node at startup, then DRA drivers recreate them if (and only if) they start up again. This ensures that there are no orphaned ResourceSlices when a driver gets removed while the kubelet was down. While at it, logging gets cleaned up and updated to use structured, contextual logging as much as possible. gRPC requests and streams now use a shared, per-process request ID and streams also get logged.	2024-07-18 09:09:19 +02:00
Patrick Ohly	77341f7595	DRA: remove support for v1alpha2 kubelet API The v1alpha2 API is several releases old. No current drivers should still depend on it.	2024-04-19 18:27:05 +02:00
Patrick Ohly	f149d6d8f9	dra e2e: watch claims and validate them Logging claims helps with debugging test failures. Checking the finalizer catches unexpected behavior.	2024-03-14 20:42:31 +01:00
Patrick Ohly	0b6a0d686a	dra api: rename NodeResourceSlice -> ResourceSlice While currently those objects only get published by the kubelet for node-local resources, this could change once we also support network-attached resources. Dropping the "Node" prefix enables such a future extension. The NodeName in ResourceSlice and StructuredResourceHandle then becomes optional. The kubelet still needs to provide one and it must match its own node name, otherwise it doesn't have permission to access ResourceSlice objects.	2024-03-07 22:22:55 +01:00
Patrick Ohly	d59676a545	dra kubelet: publish NodeResourceSlices The information is received from the DRA driver plugin through a new gRPC streaming interface. This is backwards compatible with old DRA driver kubelet plugins, their gRPC server will return "not implemented" and that can be handled by kubelet. Therefore no API break is needed. However, DRA drivers need to be updated because the Go API changed. They can return status.New(codes.Unimplemented, "no node resource support").Err() if they don't support the new ListAndWatchResources method and structured parameters. The controller in kubelet then synchronizes this information from the driver with NodeResourceSlice objects, creating, updating and deleting them as needed.	2024-03-07 22:22:13 +01:00
Patrick Ohly	5e40afca06	dra testing: add tests for structured parameters The test driver now supports a ConfigMap (as before) and the named resources structured parameter model. It doesn't have any instance attributes.	2024-03-07 22:22:13 +01:00
Patrick Ohly	36146ad686	e2e dra: enhance test driver Several enhancements: - `--resource-config` is now listed under `controller` options instead of `leader election`: merely a cosmetic change - The driver name can be configured as part of the resource config. The command line flag overrides the config, but only when set explicitly. This makes it possible to pre-define complete driver setups where the name is associated with certain resource availability. This will be used for testing cluster autoscaling. - The set of nodes where resources are available can optionally be specified via node labels. This will be used for testing cluster autoscaling.	2023-09-25 19:50:33 +02:00
carlory	57226fbd27	e2e_dra: stop using deprecated framework.ExpectEqual Co-authored-by: Thomas Milox <thomasmilox@gmail.com>	2023-07-25 10:03:56 +08:00
Patrick Ohly	d743c50bb9	kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API Combining all prepare/unprepare operations for a pod enables plugins to optimize the execution. Plugins can continue to use the v1beta2 API for now, but should switch. The new API is designed so that plugins which want to work on each claim one-by-one can do so and then report errors for each claim separately, i.e. partial success is supported.	2023-07-12 14:50:30 +02:00
Patrick Ohly	4a5a242a68	dra e2e: using logging for background activity ginkgo.By should be used for steps in the test flow. Creating and deleting CDI files happens in parallel to that. If reported via ginkgo.By, progress reports look weird because they contain e.g. step "waiting for...." (from the main test, which is still on-going) and end with "creating CDI file" (which is already completed).	2023-06-28 21:48:57 +02:00
Patrick Ohly	d0a64739e2	e2e dra: collect and check GRPC calls If kubelet plugin registration fails, it would be good to know more about the communication with kubelet. Capturing the GRPC calls and then checking that makes the failure messages more informative. Here's an example where a failure was triggered by temporarily modifying the check so that it didn't find the call: [FAILED] Timed out after 30.000s. Expected: <[]app.GRPCCall \| len:2, cap:2>: [ { FullMethod: "/pluginregistration.Registration/GetInfo", Request: {}, Response: endpoint: /var/lib/kubelet/plugins/test-driver/dra.sock name: test-driver.cdi.k8s.io supported_versions: - 1.0.0 type: DRAPlugin, Err: nil, }, { FullMethod: "/pluginregistration.Registration/NotifyRegistrationStatus", Request: plugin_registered: true, Response: {}, Err: nil, }, ] to contain successful NotifyRegistrationStatus call	2023-06-01 09:58:05 +02:00
Kevin Klues	579295e727	Update kubeletplugin API for DynamicResourceAllocation to v1alpha2 This PR makes the NodePrepareResources() and NodeUnprepareResource() calls of the kubeletplugin API for DynamicResourceAllocation symmetrical. It wasn't clear how one would use the set of CDIDevices passed back in the NodeUnprepareResource() of the v1alpha1 API, and the new API now passes back the full ResourceHandle that was originally passed to the Prepare() call. Passing the ResourceHandle is strictly more informative and a plugin could always (re)derive the set of CDIDevice from it. This is a breaking change, but this release is scheduled to break multiple APIs for DynamicResourceAllocation, so it makes sense to do this now instead of later. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-03-14 23:09:44 +00:00
Patrick Ohly	106fce6fae	e2e dra: improve goroutine handling There is an API now to wait for informer factory goroutine termination. While at it, an incorrect comment for mutex locking gets removed.	2023-03-01 15:00:30 +01:00
Patrick Ohly	2f6c4f5eab	e2e: use Ginkgo context All code must use the context from Ginkgo when doing API calls or polling for a change, otherwise the code would not return immediately when the test gets aborted.	2022-12-16 20:14:04 +01:00
Patrick Ohly	d4729008ef	e2e: simplify test cleanup ginkgo.DeferCleanup has multiple advantages: - The cleanup operation can get registered if and only if needed. - No need to return a cleanup function that the caller must invoke. - Automatically determines whether a context is needed, which will simplify the introduction of context parameters. - Ginkgo's timeline shows when it executes the cleanup operation.	2022-12-13 08:09:01 +01:00
Patrick Ohly	14db9d1f92	e2e dra: add test driver and tests for dynamic resource allocation The driver can be used manually against a cluster started with local-up-cluster.sh and is also used for E2E testing. Because the tests proxy connections from the nodes into the e2e.test binary and create/delete files via the equivalent of "kubectl exec dd/rm", they can be run against arbitrary clusters. Each test gets its own driver instance and resource class, therefore they can run in parallel.	2022-11-12 00:17:15 +01:00

17 Commits