kubernetes

Author	SHA1	Message	Date
Patrick Ohly	7701a48bd6	dra kubelet: bump gRPC API to v1alpha4 The previous changes are an API break, therefore we need a new version.	2024-07-18 23:30:09 +02:00
Patrick Ohly	ee3205804b	dra e2e: demonstrate how to use RBAC + VAP for a kubelet plugin In reality, the kubelet plugin of a DRA driver is meant to be deployed as a daemonset with a service account that limits its permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens ensures that the node name is bound to the pod, which then can be used in a validating admission policy (VAP) to ensure that the operations are limited to the node. In E2E testing, we emulate that via impersonation. This ensures that the plugin does not accidentally depend on additional permissions.	2024-07-18 23:30:09 +02:00
Patrick Ohly	348f94ab55	DRA: read ResourceClaim in DRA drivers This is the second and final step towards making kubelet independent of the resource.k8s.io API versioning because it now doesn't need to copy structs defined by that API from the driver to the API server.	2024-07-18 09:09:20 +02:00
Patrick Ohly	616a014347	DRA: move ResourceSlice publishing into DRA drivers This is a first step towards making kubelet independent of the resource.k8s.io API versioning because it now doesn't need to copy structs defined by that API from the driver to the API server. The next step is removing the other direction (reading ResourceClaim status and passing the resource handle to drivers). The drivers must get deployed so that they have their own connection to the API server. Securing at least the writes via a validating admission policy should be possible. As before, the kubelet removes all ResourceSlices for its node at startup, then DRA drivers recreate them if (and only if) they start up again. This ensures that there are no orphaned ResourceSlices when a driver gets removed while the kubelet was down. While at it, logging gets cleaned up and updated to use structured, contextual logging as much as possible. gRPC requests and streams now use a shared, per-process request ID and streams also get logged.	2024-07-18 09:09:19 +02:00
Kubernetes Prow Robot	ac9aec9f9b	Merge pull request #125116 from pohly/dra-one-of-source DRA: remove "source" indirection from v1 Pod API	2024-06-28 12:46:45 -07:00
Patrick Ohly	bde9b64cdf	DRA: remove "source" indirection from v1 Pod API This makes the API nicer: resourceClaims: - name: with-template resourceClaimTemplateName: test-inline-claim-template - name: with-claim resourceClaimName: test-shared-claim Previously, this was: resourceClaims: - name: with-template source: resourceClaimTemplateName: test-inline-claim-template - name: with-claim source: resourceClaimName: test-shared-claim A more long-term benefit is that other, future alternatives might not make sense under the "source" umbrella. This is a breaking change. It's justified because DRA is still alpha and will have several other API breaks in 1.31.	2024-06-27 17:53:24 +02:00
Patrick Ohly	7f87629a3f	DRA e2e: fix error reporting in test driver Dropping the error that is returned by allocateOne hides the reason why allocation failed. Including the UID is "too much information" for an error message (usually the user doesn't care about the exact identity, just the name) and the claim name can and will be added by the caller. Before: controller.go:373: E0625 16:04:12.140953] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dramq9jv-resource-h72pg: failed allocating claim 8551afba-3c9a-4a8a-8633-6fad6c4b9e42" key="schedulingCtx:test/test-dramq9jv" event.go:377: I0625 16:04:12.141031] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dra65gfw", UID:"6be9ba57-31da-4fef-b61d-b0468d71afcf", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"197", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dra65gfw-resource-zpzrj: failed allocating claim f98a32e1-ab7d-4b34-a258-6d8224aa9006 After: controller.go:373: E0625 16:02:54.248059] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dram98ll-resource-nvsbj: device selectors are not supported" key="schedulingCtx:test/test-dram98ll" event.go:377: I0625 16:02:54.248163] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dratpt77", UID:"24010402-b026-4fe4-a535-e1dab69db8c0", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"298", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dratpt77-resource-vlgrv: device selectors are not supported	2024-06-25 16:04:56 +02:00
Ed Bartosh	c8c7ae85e5	e2e_node: DRA: add CountCalls API	2024-06-07 22:47:23 +03:00
Ed Bartosh	ffc407b4dd	e2e_node: DRA: reimplement call blocking	2024-06-07 22:47:20 +03:00
Ed Bartosh	2ea2fb3166	e2e: test-driver: implement failure mode	2024-06-07 22:45:35 +03:00
Ed Bartosh	f609aa8310	e2e: test-driver: add new matchers	2024-05-25 01:02:25 +03:00
Patrick Ohly	77341f7595	DRA: remove support for v1alpha2 kubelet API The v1alpha2 API is several releases old. No current drivers should still depend on it.	2024-04-19 18:27:05 +02:00
Kubernetes Prow Robot	7ebb64d176	Merge pull request #124235 from bitoku/dra-e2e Use WaitForPodCondition instead of sleep	2024-04-18 03:24:42 -07:00
Kubernetes Prow Robot	d2ce87eb94	Merge pull request #123938 from pohly/dra-structured-parameters-tests DRA: test for structured parameters	2024-04-18 02:10:08 -07:00
Ayato Tokubi	c52160eb3c	Use WaitForPodCondition instead of sleep Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2024-04-13 00:01:11 +00:00
Ed Bartosh	26881132bd	kubelet: assign Node as an owner for the ResourceSlice Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>	2024-03-15 09:46:13 +02:00
Patrick Ohly	cf8fffae72	dra e2e: sanity check resource handle When using structured parameters, the instance name must match and not be in use already. NodeUnprepareResources must be called with the same handle are NodePrepareResources.	2024-03-14 20:42:31 +01:00
Patrick Ohly	f149d6d8f9	dra e2e: watch claims and validate them Logging claims helps with debugging test failures. Checking the finalizer catches unexpected behavior.	2024-03-14 20:42:31 +01:00
Patrick Ohly	a0add8d2c7	dra api: NodeResourceModel -> ResourceModel When renaming NodeResourceSlice to ResourceSlice, the embedded [Node]ResourceModel also should have been renamed.	2024-03-14 18:07:36 +01:00
Patrick Ohly	7f5566ac6f	dra e2e: enable more tests for usage with structured parameters This finishes the shuffling around of test scenarios so that all of them which make sense with structured parameters are also executed with those.	2024-03-07 22:26:20 +01:00
Patrick Ohly	2c6246c906	dra e2e: move ResourceSlice test This should better run with multiple nodes, it's more realistic that way.	2024-03-07 22:23:03 +01:00
Patrick Ohly	0b6a0d686a	dra api: rename NodeResourceSlice -> ResourceSlice While currently those objects only get published by the kubelet for node-local resources, this could change once we also support network-attached resources. Dropping the "Node" prefix enables such a future extension. The NodeName in ResourceSlice and StructuredResourceHandle then becomes optional. The kubelet still needs to provide one and it must match its own node name, otherwise it doesn't have permission to access ResourceSlice objects.	2024-03-07 22:22:55 +01:00
Patrick Ohly	234dc1f63d	dra e2e: run more test scenarios with structured parameters	2024-03-07 22:22:13 +01:00
Patrick Ohly	d59676a545	dra kubelet: publish NodeResourceSlices The information is received from the DRA driver plugin through a new gRPC streaming interface. This is backwards compatible with old DRA driver kubelet plugins, their gRPC server will return "not implemented" and that can be handled by kubelet. Therefore no API break is needed. However, DRA drivers need to be updated because the Go API changed. They can return status.New(codes.Unimplemented, "no node resource support").Err() if they don't support the new ListAndWatchResources method and structured parameters. The controller in kubelet then synchronizes this information from the driver with NodeResourceSlice objects, creating, updating and deleting them as needed.	2024-03-07 22:22:13 +01:00
Patrick Ohly	5e40afca06	dra testing: add tests for structured parameters The test driver now supports a ConfigMap (as before) and the named resources structured parameter model. It doesn't have any instance attributes.	2024-03-07 22:22:13 +01:00
Patrick Ohly	6f1ddfcd2e	kubelet: support structured parameters for preparing resources If the resource handle has data from a structured parameter model, then we need to pass that to the DRA driver kubelet plugin. Because Kubernetes uses gogo/protobuf, we cannot use "optional" for that new optional field and have to resort to "repeated" with a single repetition if present. This is a new, backwards-compatible field. That extending the resource.k8s.io changes the checksum of a kubelet checkpoint is unfortunate. Updating the test cases is a stop-gap measure, the actual solution will have to be something else before beta.	2024-03-07 22:22:13 +01:00
Tim Hockin	81ba0f3b44	Make golang::setup-env turn on workspaces Both GO111MODULE and GOWORK default to on, so this just unsets them. We could set them to explicit values but this seems equivalent and cleaner.	2024-02-29 22:07:42 -08:00
Patrick Ohly	cb3180950e	dra e2e: fix stack unwinding in helper function When failing inside the `ginkgo.By` callback function, skipping intermediate stack frames didn't work properly because `ginkgo.By` itself and other internal code is also on the stack. To fix this, the code which can fail now runs outside of such a callback. That's not a big loss, the only advantage of the callback was getting timing statistics from Ginkgo which weren't used in practice.	2024-02-19 17:11:04 +01:00
Patrick Ohly	5d1509126f	dra: patch ReservedFor during PreBind This moves adding a pod to ReservedFor out of the main scheduling cycle into PreBind. There it is done concurrently in different goroutines. For claims which were specifically allocated for a pod (the most common case), that usually makes no difference because the claim is already reserved. It starts to matter when that pod then cannot be scheduled for other reasons, because then the claim gets unreserved to allow deallocating it. It also matters for claims that are created separately and then get used multiple times by different pods. Because multiple pods might get added to the same claim rapidly independently from each other, it makes sense to do all claim status updates via patching: then it is no longer necessary to have an up-to-date copy of the claim because the patch operation will succeed if (and only if) the patched claim is valid. Server-side-apply cannot be used for this because a client always has to send the full list of all entries that it wants to be set, i.e. it cannot add one entry unless it knows the full list.	2024-01-26 10:58:03 +01:00
Patrick Ohly	4ede571f8b	dra e2e: unify per-node resource specification When using a builder pattern for the actual callback, some common code can be moved into a single function.	2023-12-21 12:43:28 +01:00
Patrick Ohly	f2cfbf44b1	e2e: use framework labels This changes the text registration so that tags for which the framework has a dedicated API (features, feature gates, slow, serial, etc.) those APIs are used. Arbitrary, custom tags are still left in place for now.	2023-11-01 15:17:34 +01:00
Kubernetes Prow Robot	4294c35fc9	Merge pull request #121297 from calvinballing/spellcheck-markdown Fix typos in markdown	2023-10-25 13:18:26 +02:00
Kubernetes Prow Robot	7b9d244efd	Merge pull request #120965 from bart0sh/PR122-DRA-unexpected-node-shutdown DRA: e2e: test non-graceful node shutdown	2023-10-20 11:58:47 +02:00
Ed Bartosh	fb9f2f5bc5	DRA: e2e: test non-graceful node shutdown	2023-10-19 22:09:11 +03:00
Jim Hays	911700e64e	Fix typos in markdown	2023-10-17 10:55:40 -04:00
Patrick Ohly	36146ad686	e2e dra: enhance test driver Several enhancements: - `--resource-config` is now listed under `controller` options instead of `leader election`: merely a cosmetic change - The driver name can be configured as part of the resource config. The command line flag overrides the config, but only when set explicitly. This makes it possible to pre-define complete driver setups where the name is associated with certain resource availability. This will be used for testing cluster autoscaling. - The set of nodes where resources are available can optionally be specified via node labels. This will be used for testing cluster autoscaling.	2023-09-25 19:50:33 +02:00
Patrick Ohly	c682d2b8c5	scheduler: add ResourceClass events When filtering fails because a ResourceClass is missing, we can treat the pod as "unschedulable" as long as we then also register a cluster event that wakes up the pod. This is more efficient than periodically retrying.	2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot	e298e92115	Merge pull request #119819 from pohly/dra-performance-test-driver dra test: enhance performance of test driver controller	2023-08-16 04:32:26 -07:00
Patrick Ohly	0e23840929	dra test: enhance performance of test driver controller Analyzing the CPU profile of go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.Claim. -benchtime=1ns -run=xxx ./test/integration/scheduler_perf showed that a significant amount of time was spent iterating over allocated claims to determine how many were allocated per node. That "naive" approach was taken to avoid maintaining a redundant data structure, but now that performance measurements show that this comes at a cost, it's not "premature optimization" anymore to introduce such a second field. The average scheduling throughput in SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4 pods/s to 19.2 pods/s.	2023-08-08 13:36:35 +02:00
carlory	57226fbd27	e2e_dra: stop using deprecated framework.ExpectEqual Co-authored-by: Thomas Milox <thomasmilox@gmail.com>	2023-07-25 10:03:56 +08:00
Kubernetes Prow Robot	bea27f82d3	Merge pull request #118209 from pohly/dra-pre-scheduled-pods dra: pre-scheduled pods	2023-07-13 14:43:37 -07:00
Patrick Ohly	80ab8f0542	dra: handle scheduled pods in kube-controller-manager When someone decides that a Pod should definitely run on a specific node, they can create the Pod with spec.nodeName already set. Some custom scheduler might do that. Then kubelet starts to check the pod and (if DRA is enabled) will refuse to run it, either because the claims are still waiting for the first consumer or the pod wasn't added to reservedFor. Both are things the scheduler normally does. Also, if a pod got scheduled while the DRA feature was off in the kube-scheduler, a pod can reach the same state. The resource claim controller can handle these two cases by taking over for the kube-scheduler when nodeName is set. Triggering an allocation is simpler than in the scheduler because all it takes is creating the right PodSchedulingContext with spec.selectedNode set. There's no need to list nodes because that choice was already made, permanently. Adding the pod to reservedFor also isn't hard. What's currently missing is triggering de-allocation of claims to re-allocate them for the desired node. This is not important for claims that get created for the pod from a template and then only get used once, but it might be worthwhile to add de-allocation in the future.	2023-07-13 21:27:11 +02:00
Kubernetes Prow Robot	047d040ce7	Merge pull request #119012 from pohly/dra-batch-node-prepare kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API	2023-07-12 10:57:37 -07:00
Patrick Ohly	08d40f53a7	dra: test with and without immediate ReservedFor The recommendation and default in the controller helper code is to set ReservedFor to the pod which triggered delayed allocation. However, this is neither required nor enforced. Therefore we should also test the fallback path were kube-scheduler itself adds the pod to ReservedFor.	2023-07-12 16:57:17 +02:00
Kubernetes Prow Robot	3cc729fc7f	Merge pull request #119195 from pohly/dra-reallocate-flake dra e2e: fix "reallocation works" flake	2023-07-12 05:55:25 -07:00
Patrick Ohly	d743c50bb9	kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API Combining all prepare/unprepare operations for a pod enables plugins to optimize the execution. Plugins can continue to use the v1beta2 API for now, but should switch. The new API is designed so that plugins which want to work on each claim one-by-one can do so and then report errors for each claim separately, i.e. partial success is supported.	2023-07-12 14:50:30 +02:00
Patrick Ohly	c143a875ed	dra e2e: fix "reallocation works" flake The main problem probably was that https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first pod before setting up the callback which blocks allocating one claim for that pod. This is racy because allocations happen in the background. The test also was unnecessarily complex and hard to read: - The intended effect can be achieved with three instead of four claims. - It wasn't clear which claim has "external-claim-other" as name. Using the claim variable avoids that.	2023-07-12 11:20:47 +02:00
Patrick Ohly	ba810871ad	dra e2e: check that not generating a ResourceClaim works This is not something that normally happens, but the API supports it because it might be needed at some point, so we have to test it.	2023-07-11 14:23:49 +02:00
Patrick Ohly	444d23bd2f	dra: generated name for ResourceClaim from template Generating the name avoids all potential name collisions. It's not clear how much of a problem that was because users can avoid them and the deterministic names for generic ephemeral volumes have not led to reports from users. But using generated names is not too hard either. What makes it relatively easy is that the new pod.status.resourceClaimStatus map stores the generated name for kubelet and node authorizer, i.e. the information in the pod is sufficient to determine the name of the ResourceClaim. The resource claim controller becomes a bit more complex and now needs permission to modify the pod status. The new failure scenario of "ResourceClaim created, updating pod status fails" is handled with the help of a new special "resource.kubernetes.io/pod-claim-name" annotation that together with the owner reference identifies exactly for what a ResourceClaim was generated, so updating the pod status can be retried for existing ResourceClaims. The transition from deterministic names is handled with a special case for that recovery code path: a ResourceClaim with no annotation and a name that follows the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod claim and gets added to the pod status. There's no immediate need for it, but just in case that it may become relevant, the name of the generated ResourceClaim may also be left unset to record that no claim was needed. Components processing such a pod can skip whatever they normally would do for the claim. To ensure that they do and also cover other cases properly ("no known field is set", "must check ownership"), resourceclaim.Name gets extended.	2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot	d02d8ba635	Merge pull request #118862 from byako/batching-dra-calls DRA controller: batch resource claims for Allocate	2023-07-06 11:33:03 -07:00

1 2

94 Commits