Commit Graph

64 Commits

Author SHA1 Message Date
Patrick Ohly
f2cfbf44b1 e2e: use framework labels
This changes the text registration so that tags for which the framework has a
dedicated API (features, feature gates, slow, serial, etc.) those APIs are
used.

Arbitrary, custom tags are still left in place for now.
2023-11-01 15:17:34 +01:00
Kubernetes Prow Robot
4294c35fc9 Merge pull request #121297 from calvinballing/spellcheck-markdown
Fix typos in markdown
2023-10-25 13:18:26 +02:00
Kubernetes Prow Robot
7b9d244efd Merge pull request #120965 from bart0sh/PR122-DRA-unexpected-node-shutdown
DRA: e2e: test non-graceful node shutdown
2023-10-20 11:58:47 +02:00
Ed Bartosh
fb9f2f5bc5 DRA: e2e: test non-graceful node shutdown 2023-10-19 22:09:11 +03:00
Jim Hays
911700e64e Fix typos in markdown 2023-10-17 10:55:40 -04:00
Patrick Ohly
36146ad686 e2e dra: enhance test driver
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
  `leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
  command line flag overrides the config, but only when set explicitly.
  This makes it possible to pre-define complete driver setups where the
  name is associated with certain resource availability. This will be
  used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
  via node labels. This will be used for testing cluster autoscaling.
2023-09-25 19:50:33 +02:00
Patrick Ohly
c682d2b8c5 scheduler: add ResourceClass events
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot
e298e92115 Merge pull request #119819 from pohly/dra-performance-test-driver
dra test: enhance performance of test driver controller
2023-08-16 04:32:26 -07:00
Patrick Ohly
0e23840929 dra test: enhance performance of test driver controller
Analyzing the CPU profile of

    go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf

showed that a significant amount of time was spent iterating over allocated
claims to determine how many were allocated per node. That "naive" approach was
taken to avoid maintaining a redundant data structure, but now that performance
measurements show that this comes at a cost, it's not "premature optimization"
anymore to introduce such a second field.

The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4
pods/s to 19.2 pods/s.
2023-08-08 13:36:35 +02:00
carlory
57226fbd27 e2e_dra: stop using deprecated framework.ExpectEqual
Co-authored-by: Thomas Milox <thomasmilox@gmail.com>
2023-07-25 10:03:56 +08:00
Kubernetes Prow Robot
bea27f82d3 Merge pull request #118209 from pohly/dra-pre-scheduled-pods
dra: pre-scheduled pods
2023-07-13 14:43:37 -07:00
Patrick Ohly
80ab8f0542 dra: handle scheduled pods in kube-controller-manager
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.

Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.

The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.

What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
2023-07-13 21:27:11 +02:00
Kubernetes Prow Robot
047d040ce7 Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Patrick Ohly
08d40f53a7 dra: test with and without immediate ReservedFor
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
2023-07-12 16:57:17 +02:00
Kubernetes Prow Robot
3cc729fc7f Merge pull request #119195 from pohly/dra-reallocate-flake
dra e2e: fix "reallocation works" flake
2023-07-12 05:55:25 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Patrick Ohly
c143a875ed dra e2e: fix "reallocation works" flake
The main problem probably was that
https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first
pod before setting up the callback which blocks allocating one claim for that
pod. This is racy because allocations happen in the background.

The test also was unnecessarily complex and hard to read:
- The intended effect can be achieved with three instead of four claims.
- It wasn't clear which claim has "external-claim-other" as name.
  Using the claim variable avoids that.
2023-07-12 11:20:47 +02:00
Patrick Ohly
ba810871ad dra e2e: check that not generating a ResourceClaim works
This is not something that normally happens, but the API supports it because it
might be needed at some point, so we have to test it.
2023-07-11 14:23:49 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
d02d8ba635 Merge pull request #118862 from byako/batching-dra-calls
DRA controller: batch resource claims for Allocate
2023-07-06 11:33:03 -07:00
Kubernetes Prow Robot
6f9d1d38d8 Merge pull request #118817 from pohly/dra-delete-claims
DRA: improve handling of completed pods
2023-07-06 10:15:15 -07:00
Alexey Fomenko
b10cc642b5 DRA controller: batch resource claims for Allocate
Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2023-07-06 19:31:45 +03:00
Patrick Ohly
a514f40131 dra resourceclaim controller: delete generated claims when pod is done
When a pod is done, but not getting removed yet for while, then a claim that
got generated for that pod can be deleted already. This then also triggers
deallocation.
2023-07-05 16:10:20 +02:00
Patrick Ohly
e8a0c42212 dra resourceclaim controller: remove reservation for completed pods
When a pod is known to never run (again), the reservation for it also can be
removed. This is relevant in particular for the job controller.
2023-07-05 16:10:20 +02:00
Patrick Ohly
c903c29c3b e2e: support admissionapi.LevelRestricted in test/e2e/framwork/pod
CreatePod and MakePod only accepted an `isPrivileged` boolean, which made it
impossible to write tests using those helpers which work in a default
framework.Framework, because the default there is LevelRestricted.

The simple boolean gets replaced with admissionapi.Level. Passing
LevelRestricted does the same as calling e2epod.MixinRestrictedPodSecurity.

Instead of explicitly passing a constant to these modified helpers, most tests
get updated to pass f.NamespacePodSecurityLevel. This has the advantage
that if that level gets lowered in the future, tests only need to be updated in
one place.

In some cases, helpers taking client+namespace+timeouts parameters get replaced
with passing the Framework instance to get access to
f.NamespacePodSecurityEnforceLevel. These helpers don't need separate
parameters because in practice all they ever used where the values from the
Framework instance.
2023-07-03 16:26:28 +02:00
Kubernetes Prow Robot
ec87834bae Merge pull request #118936 from pohly/dra-deallocate-when-unused
DRA: for delayed allocation, deallocate when no longer used
2023-07-01 12:56:48 -07:00
Patrick Ohly
1b47e6433b dra delayed allocation: deallocate when a pod is done
This releases the underlying resource sooner and ensures that another consumer
can get scheduled without being influenced by a decision that was made for the
previous consumer.

An alternative would have been to have the apiserver trigger the deallocation
whenever it sees the `status.reservedFor` getting reduced to zero. But that
then also triggers deallocation when kube-scheduler removes the last
reservation after a failed scheduling cycle. In that case we want to keep the
claim allocated and let the kube-scheduler decide on a case-by-case basis which
claim should get deallocated.
2023-06-29 09:47:30 +02:00
Patrick Ohly
4a5a242a68 dra e2e: using logging for background activity
ginkgo.By should be used for steps in the test flow. Creating and deleting CDI
files happens in parallel to that. If reported via ginkgo.By, progress reports
look weird because they contain e.g. step "waiting for...." (from the main
test, which is still on-going) and end with "creating CDI file" (which is
already completed).
2023-06-28 21:48:57 +02:00
Kubernetes Prow Robot
2190775b69 Merge pull request #118280 from stlaz/e2e_psa_labels
Set all PSa labels in tests
2023-06-28 11:14:43 -07:00
Stanislav Laznicka
7f532891c9 e2e tests: set all PSa labels instead of just enforcing 2023-06-21 15:05:13 +02:00
Patrick Ohly
ec70b2ec80 e2e dra: add "kubelet must skip NodePrepareResource if not used by any container"
If (for whatever reason) no container uses a claim, then there's no need to
prepare it.
2023-06-21 10:42:22 +02:00
Kubernetes Prow Robot
8fd27c6137 Merge pull request #118574 from bart0sh/PR118-DRA-E2E-Node-test-grpc-timeout
DRA: E2E Node: test GRPC timeout
2023-06-13 03:39:58 -07:00
Ed Bartosh
5c5f6e8fe2 DRA Node E2E: add NodePrepareResourceCalled API 2023-06-13 12:42:05 +03:00
Ed Bartosh
673d0aaa60 DRA Node E2E: add call blocking to the Kubelet plugin APIs 2023-06-13 12:41:59 +03:00
Alexey Fomenko
0222e6d4ae Update kind details for DRA e2e 2023-06-12 23:34:02 +03:00
Patrick Ohly
d0a64739e2 e2e dra: collect and check GRPC calls
If kubelet plugin registration fails, it would be good to know more about the
communication with kubelet. Capturing the GRPC calls and then checking that
makes the failure messages more informative. Here's an example where a failure
was triggered by temporarily modifying the check so that it didn't find the
call:

  [FAILED] Timed out after 30.000s.
  Expected:
      <[]app.GRPCCall | len:2, cap:2>: [
          {
              FullMethod: "/pluginregistration.Registration/GetInfo",
              Request:
                  {},
              Response:
                  endpoint: /var/lib/kubelet/plugins/test-driver/dra.sock
                  name: test-driver.cdi.k8s.io
                  supported_versions:
                  - 1.0.0
                  type: DRAPlugin,
              Err: nil,
          },
          {
              FullMethod: "/pluginregistration.Registration/NotifyRegistrationStatus",
              Request:
                  plugin_registered: true,
              Response:
                  {},
              Err: nil,
          },
      ]
  to contain successful NotifyRegistrationStatus call
2023-06-01 09:58:05 +02:00
Benjamin Elder
e0b7f31ce6 use standard base image in dra dev
kind is on containerd 1.7.x now
2023-05-23 11:46:30 -07:00
Patrick Ohly
073b4cf66a test/e2e/dra: fix kind cluster creation
The nightly containerd binary no longer works in the current kind base images:

   May 15 16:32:31 kind-worker containerd[222]: /usr/local/bin/containerd:
   /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by
   /usr/local/bin/containerd)

kind now builds containerd directly with the base images. The official base
images still use containerd 1.6, so we have to use a special base image that
was prepared for this purpose.

Because the containerd config can be patched through kind, we don't need to
modify the generated node image anymore.
2023-05-16 08:11:43 +02:00
Kubernetes Prow Robot
a4f3ebf84b Merge pull request #117932 from bart0sh/PR114-e2e-DRA-use-containerd-1.7
DRA: use containerd 1.7 in kind image
2023-05-11 08:59:13 -07:00
Kubernetes Prow Robot
e1ad9bee5b Merge pull request #117902 from TommyStarK/doc/e2e-dra
test/e2e/dra: update README
2023-05-11 04:15:18 -07:00
Ed Bartosh
8f11f5bb2b DRA: use containerd 1.7 in kind image
As Containerd 1.6 doesn't support CDI we want to stay
closer to 1.7

Containerd 1.7 is going to be the first official
release with full CDI support.
2023-05-11 12:47:23 +03:00
TommyStarK
3b634de6ff test/e2e/dra: update README
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-05-10 23:29:10 +02:00
Ed Bartosh
d5f4b9634c DRA: fix image build on Mac 2023-05-10 21:14:36 +03:00
Kevin Klues
579295e727 Update kubeletplugin API for DynamicResourceAllocation to v1alpha2
This PR makes the NodePrepareResources() and NodeUnprepareResource()
calls of the kubeletplugin API for DynamicResourceAllocation
symmetrical. It wasn't clear how one would use the set of CDIDevices
passed back in the NodeUnprepareResource() of the v1alpha1 API, and the
new API now passes back the full ResourceHandle that was originally
passed to the Prepare() call. Passing the ResourceHandle is strictly
more informative and a plugin could always (re)derive the set of
CDIDevice from it.

This is a breaking change, but this release is scheduled to break
multiple APIs for DynamicResourceAllocation, so it makes sense to do
this now instead of later.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 23:09:44 +00:00
Kevin Klues
6ba9b91604 Update e2e tests for recent changes to resource.k8s.io/v1alpha2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-14 22:34:19 +00:00
Patrick Ohly
29941b8d3e api: resource.k8s.io v1alpha1 -> v1alpha2
For Kubernetes 1.27, we intend to make some breaking API changes:
- rename PodScheduling -> PodSchedulingHints (https://github.com/kubernetes/kubernetes/issues/114283)
- extend ResourceClaimStatus (https://github.com/kubernetes/enhancements/pull/3802)

We need to switch from v1alpha1 to v1alpha2 for that.
2023-03-14 07:52:03 +01:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Ed Bartosh
35fd124f4d DRA: fix CDI spec version
The latest CDI release includes spec version check that fails
if version is less than 0.3.0:
  https://github.com/container-orchestrated-devices/container-device-interface/blob/v0.5.4/pkg/cdi/version.go#L42

Updating CDI spec version to 0.3.0 in the test kubelet plugin code
should fix e2e test failures on the CRI runtimes that use CDI >= 0.5.4
(Containerd master atm, CRI-O soon).
2023-03-05 16:49:56 +02:00
Patrick Ohly
74785074c6 e2e dra: update logging
When running as part of the scheduler_perf benchmark testing, we want to print
less information by default, so we should use V to limit verbosity

Pretty-printing doesn't belong into "application" code. I am moving that into
the ktesting formatting (https://github.com/kubernetes/kubernetes/pull/116180).
2023-03-01 15:02:03 +01:00
Patrick Ohly
106fce6fae e2e dra: improve goroutine handling
There is an API now to wait for informer factory goroutine termination.
While at it, an incorrect comment for mutex locking gets removed.
2023-03-01 15:00:30 +01:00