Commit Graph

15087 Commits

Author SHA1 Message Date
Kevin Hannon
a1bbae8168 fix resource health status test failures in unlabeled jobs 2024-07-26 09:43:48 -04:00
Kubernetes Prow Robot
6ac20677c7 Merge pull request #126274 from ConnorJC3/flaky-vac-test
De-flake VAC tests by returning new PVC from WaitForVolumeModification
2024-07-24 15:39:52 -07:00
Kubernetes Prow Robot
ab470aad01 Merge pull request #126220 from saschagrunert/image-volumesource-e2e
[KEP-4639] Add `ImageVolumeSource` node e2e tests
2024-07-24 06:40:50 -07:00
Sascha Grunert
bc452887fa Add ImageVolumeSource e2e tests
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2024-07-24 13:57:39 +02:00
Kubernetes Prow Robot
c75e30d049 Merge pull request #126294 from aojea/nosnat
e2e test for No SNAT
2024-07-23 20:12:33 -07:00
Kubernetes Prow Robot
320f1ab30d Merge pull request #126182 from sohankunkerkar/fix-procmount
test/e2e/windows: drop securityContext test for ProcMount
2024-07-23 14:39:51 -07:00
Kubernetes Prow Robot
c2fdeca4ab Merge pull request #126145 from carlory/kep-3751-api
[KEP-3751] Promote VolumeAttributesClass to beta
2024-07-23 13:31:05 -07:00
Kubernetes Prow Robot
107f621462 Merge pull request #126108 from gnufied/changes-volume-recovery
Reduce state changes when expansion fails and mark certain failures as infeasible
2024-07-23 13:30:56 -07:00
Drew Sirenko
16c2ad5b84 Add labels to PVCollector bound/unbound PVC metrics for VolumeAttributesClass Feature (#126166)
* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
2024-07-23 12:21:29 -07:00
Kubernetes Prow Robot
05bb5f71f8 Merge pull request #120611 from pohly/dra-resource-quotas
DRA: resource quotas
2024-07-23 12:20:44 -07:00
Kubernetes Prow Robot
9c2302dd3e Merge pull request #126201 from aroradaman/revert-debug-steps
Revert debug steps and logs for #123760
2024-07-23 11:02:38 -07:00
Sohan Kunkerkar
c5b01a30d3 test/e2e/windows: drop securityContext test for ProcMount
Fixes https://github.com/kubernetes/kubernetes/issues/126180

As the ProcMountType feature is disabled by default in beta and relies
on the UserNamespacesSupport feature, which is also set to false in beta,
running this test is unnecessary.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
2024-07-23 13:45:29 -04:00
Patrick Ohly
299ecde5cc DRA quota: add ResourceClaim v1.ResourceQuota limits
Dynamic resource allocation is similar to storage in the sense that users
create ResourceClaim objects to request resources, same as with persistent
volume claims. The actual resource usage is only known when allocating claims,
but some limits can already be enforced at admission time:

- "count/resourceclaims.resource.k8s.io" limits the number of ResourceClaim objects in
  a namespace; this is a generic feature that is already supported also without
  this commit.

- "resourceclaims" is *not* an alias - use "count/resourceclaims.resource.k8s.io"
  instead.

- <device-class-name>.deviceclass.resource.k8s.io/devices limits the number of
  ResourceClaim objects in a namespace such that the number of devices
  requested through those objects with that class does not exceed the limit.

A single request may cause the allocation of multiple devices. For exact
counts, the quota limit is based on the sum of those exact counts. For requests
asking for "all" matching devices, the maximum number of allocated devices per
claim is used as a worst-case upper bound.

Requests asking for "admin access" contribute to the quota.

DRA quota: remove admin mode exception
2024-07-23 18:52:34 +02:00
Patrick Ohly
b5c94966bd DRA e2e: fix the quota name
The actual name has the k8s.io suffix.
2024-07-23 18:52:33 +02:00
Antonio Ojea
046e976bab cap the num of nodes on the noSNAT test and remove slow and NoSNAT tag
run NoSNAT network test between pods without any feature tag
2024-07-23 16:27:11 +00:00
Kubernetes Prow Robot
7590cb7adf Merge pull request #125257 from vinayakankugoyal/armor
KEP-24: Update AppArmor feature gates to GA stage.
2024-07-23 09:20:52 -07:00
Connor Catlett
796ae44c08 Return new PVC in WaitForVolumeModification to prevent stale comparison
Signed-off-by: Connor Catlett <conncatl@amazon.com>
2024-07-23 14:34:34 +00:00
Kubernetes Prow Robot
1854839ff0 Merge pull request #126067 from tenzen-y/implement-job-success-policy-e2e
Graduate the JobSuccessPolicy to Beta
2024-07-23 06:14:23 -07:00
Yuki Iwai
0d4f18bd5b Job: Implement E2E tests for the JobSuccessPolicy
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-23 21:05:50 +09:00
carlory
0260c7d023 Promote VolumeAttributesClass to beta 2024-07-23 13:58:14 +08:00
Kubernetes Prow Robot
3d78fe25a7 Merge pull request #121849 from carlory/add-e2e-vac
vac add e2e test
2024-07-22 16:53:03 -07:00
Kubernetes Prow Robot
d21b17264e Merge pull request #125488 from pohly/dra-1.31
DRA for 1.31
2024-07-22 11:45:55 -07:00
Patrick Ohly
357a2926a1 DRA e2e: update VAP for a kubelet plugin
This fixes the message (node name and "cluster-scoped" were switched) and
simplifies the VAP:
- a single matchCondition short circuits completely unless they're a user
  we care about
- variables to extract the userNodeName and objectNodeName once
  (using optionals to gracefully turn missing claims and fields into empty strings)
- leaves very tiny concise validations

Co-authored-by: Jordan Liggitt <liggitt@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
9f36c8d718 DRA: add DRAControlPlaneController feature gate for "classic DRA"
In the API, the effect of the feature gate is that alpha fields get dropped on
create. They get preserved during updates if already set. The
PodSchedulingContext registration is *not* restricted by the feature gate.
This enables deleting stale PodSchedulingContext objects after disabling
the feature gate.

The scheduler checks the new feature gate before setting up an informer for
PodSchedulingContext objects and when deciding whether it can schedule a
pod. If any claim depends on a control plane controller, the scheduler bails
out, leading to:

    Status:       Pending
    ...
      Warning  FailedScheduling             73s   default-scheduler  0/1 nodes are available: resourceclaim depends on disabled DRAControlPlaneController feature. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

The rest of the changes prepare for testing the new feature separately from
"structured parameters". The goal is to have base "dra" jobs which just enable
and test those, then "classic-dra" jobs which add DRAControlPlaneController.
2024-07-22 18:09:34 +02:00
Patrick Ohly
599fe605f9 DRA scheduler: adapt to v1alpha3 API
The structured parameter allocation logic was written from scratch in
staging/src/k8s.io/dynamic-resource-allocation/structured where it might be
useful for out-of-tree components.

Besides the new features (amount, admin access) and API it now supports
backtracking when the initial device selection doesn't lead to a complete
allocation of all claims.

Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
Co-authored-by: John Belamaric <jbelamaric@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
c526d7796e DRA e2e: use VAP to control "admin access" permissions
The advantages of using a validation admission policy (VAP) are that no changes
are needed in Kubernetes and that admins have full flexibility if and how they
want to control which users are allowed to use "admin access" in their
requests.

The downside is that without admins taking actions, the feature is enabled
out-of-the-box in a cluster. Documentation for DRA will have to make it very
clear that something needs to be done in multi-tenant clusters.

The test/e2e/testing-manifests/dra/admin-access-policy.yaml shows how to do
this. The corresponding E2E tests ensures that it actually works as intended.

For some reason, adding the namespace to the message expression leads to a
type check errors, so it's currently commented out.
2024-07-22 18:09:34 +02:00
Patrick Ohly
0b62bfb690 DRA e2e: adapt to v1alpha3 API 2024-07-22 18:09:34 +02:00
Hemant Kumar
f7f1a6c81a Address review comments and return nicer errors 2024-07-22 10:43:38 -04:00
Yuki Iwai
594490fd77 Job: Add the CompletionsReached reason to the SuccessCriteriaMet condition
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-22 21:24:52 +09:00
Dr. Stefan Schimanski
834cd7ca4a aggregator: split availability controller into local and remote part
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-21 17:31:24 +02:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
de5742ae83 DRA: remove immediate allocation
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
carlory
deb9fc97d3 vac add e2e test 2024-07-21 00:48:51 +08:00
Kubernetes Prow Robot
f2428d66cc Merge pull request #125163 from pohly/dra-kubelet-api-version-independent-no-rest-proxy
DRA: make kubelet independent of the resource.k8s.io API version
2024-07-18 17:47:48 -07:00
Patrick Ohly
7701a48bd6 dra kubelet: bump gRPC API to v1alpha4
The previous changes are an API break, therefore we need a new version.
2024-07-18 23:30:09 +02:00
Patrick Ohly
ee3205804b dra e2e: demonstrate how to use RBAC + VAP for a kubelet plugin
In reality, the kubelet plugin of a DRA driver is meant to be deployed as a
daemonset with a service account that limits its
permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens
ensures that the node name is bound to the pod, which then can be used
in a validating admission policy (VAP) to ensure that the operations are
limited to the node.

In E2E testing, we emulate that via impersonation. This ensures that the plugin
does not accidentally depend on additional permissions.
2024-07-18 23:30:09 +02:00
Kubernetes Prow Robot
f82030111f Merge pull request #126198 from aojea/flaku_lb
e2e: fix flake on loadbalancer tests
2024-07-18 13:41:45 -07:00
Kubernetes Prow Robot
c4bd05df1c Merge pull request #126181 from bitoku/refactor-kubeletseparatediskgc
[sig-testing] refactor KubeletSeparateDiskGC nodefeature
2024-07-18 10:39:25 -07:00
Kubernetes Prow Robot
601eb7e9cf Merge pull request #122922 from marosset/windows-memory-eviction
Add support for Windows memory-pressure eviction
2024-07-18 10:39:06 -07:00
Kubernetes Prow Robot
3adafc6a50 Merge pull request #126194 from mimowo/job-e2e-tests-cleanup
Format helper scripts in Job e2e tests as multiline for readability
2024-07-18 09:33:39 -07:00
Kubernetes Prow Robot
dda657b598 Merge pull request #126191 from p0lyn0mial/upstream-revert-promote-watch-list-to-beta
Revert "Promote WatchList feature to Beta"
2024-07-18 07:39:28 -07:00
Daman Arora
6adac3bce1 Revert "dump not network information on e2e failures"
This reverts commit 9239e44950.
2024-07-18 19:56:05 +05:30
Daman Arora
4ea7be8fa6 Revert "e2e/network: dump iptables and conntrack flows for debugging"
This reverts commit 3f2deb51ad.
2024-07-18 19:53:41 +05:30
Antonio Ojea
fdbe6912d2 e2e: fix flake on loadbalancer tests
validating that one endpoint is reachable from one part of the cluster
is not enough condition to consider it will be reachable from any node,
as different Services proxies on different nodes will have different
propagation delays for the EndpointSlices and Services information.
2024-07-18 12:54:54 +00:00
Kubernetes Prow Robot
a491ea7af4 Merge pull request #126092 from pacoxu/fix-node-lease
fix node lease e2e flakes
2024-07-18 02:44:43 -07:00
Michal Wozniak
2d680054c1 Format helper scripts in Job e2e tests as multiline for readability 2024-07-18 11:05:36 +02:00
Ayato Tokubi
662ed5a42d refactor nodefeature
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2024-07-18 08:45:52 +00:00
Lukasz Szaszkiewicz
367401cd85 Revert "e2e/apimachinery/watchlist: always run WatchList e2e tests"
This reverts commit be00cded2d.
2024-07-18 09:29:46 +02:00
Patrick Ohly
348f94ab55 DRA: read ResourceClaim in DRA drivers
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
2024-07-18 09:09:20 +02:00