Commit Graph

21354 Commits

Author SHA1 Message Date
Patrick Ohly
d11b58efe6 DRA kubelet: refactor gRPC call timeouts
Some of the E2E node tests were flaky. Their timeout apparently was chosen
under the assumption that kubelet would retry immediately after a failed gRPC
call, with a factor of 2 as safety margin. But according to
0449cef8fd,
kubelet has a different, higher retry period of 90 seconds, which was exactly
the test timeout. The test timeout has to be higher than that.

As the tests don't use the gRPC call timeout anymore, it can be made
private. While at it, the name and documentation gets updated.
2024-07-22 18:09:34 +02:00
Patrick Ohly
599fe605f9 DRA scheduler: adapt to v1alpha3 API
The structured parameter allocation logic was written from scratch in
staging/src/k8s.io/dynamic-resource-allocation/structured where it might be
useful for out-of-tree components.

Besides the new features (amount, admin access) and API it now supports
backtracking when the initial device selection doesn't lead to a complete
allocation of all claims.

Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
Co-authored-by: John Belamaric <jbelamaric@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
877829aeaa DRA kubelet: adapt to v1alpha3 API
This adds the ability to select specific requests inside a claim for a
container.

NodePrepareResources is always called, even if the claim is not used by any
container. This could be useful for drivers where that call has some effect
other than injecting CDI device IDs into containers. It also ensures that
drivers can validate configs.

The pod resource API can no longer report a class for each claim because there
is no such 1:1 relationship anymore. Instead, that API reports claim,
API devices (with driver/pool/device as ID) and CDI device IDs. The kubelet
itself doesn't extract that information from the claim. Instead, it relies on
drivers to report this information when the claim gets prepared. This isolates
the kubelet from API changes.

Because of a faulty E2E test, kubelet was told to contact the wrong driver for
a claim. This was not visible in the kubelet log output. Now changes to the
claim info cache are getting logged. While at it, naming of variables and some
existing log output gets harmonized.

Co-authored-by: Oksana Baranova <oksana.baranova@intel.com>
Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
20f98f3a2f DRA: update helper packages
Publishing ResourceSlices now supports network-attached devices and the new
v1alpha3 API.  The logic for splitting up across different slices is missing.
2024-07-22 18:09:34 +02:00
Patrick Ohly
91d7882e86 DRA: new API for 1.31
This is a complete revamp of the original API. Some of the key
differences:
- refocused on structured parameters and allocating devices
- support for constraints across devices
- support for allocating "all" or a fixed amount
  of similar devices in a single request
- no class for ResourceClaims, instead individual
  device requests are associated with a mandatory
  DeviceClass

For the sake of simplicity, optional basic types (ints, strings) where the null
value is the default are represented as values in the API types. This makes Go
code simpler because it doesn't have to check for nil (consumers) and values
can be set directly (producers). The effect is that in protobuf, these fields
always get encoded because `opt` only has an effect for pointers.

The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new
"request" field. This is considered acceptable because the entire `claims`
field in the pod spec is still alpha.

The implementation is complete enough to bring up the apiserver.
Adapting other components follows.
2024-07-22 18:09:34 +02:00
Yuki Iwai
594490fd77 Job: Add the CompletionsReached reason to the SuccessCriteriaMet condition
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-22 21:24:52 +09:00
Dr. Stefan Schimanski
834cd7ca4a aggregator: split availability controller into local and remote part
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-21 17:31:24 +02:00
Patrick Ohly
bcececadfb CEL: add QuantityDeclType
Most functions in k8s.io/apiserver/pkg/cel work with DeclType for type
definitions, which made the existing QuantityType unusable with them. The new
QuantityDeclType fills that gap.
2024-07-21 17:28:14 +02:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
de5742ae83 DRA: remove immediate allocation
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Dr. Stefan Schimanski
bbdc247406 aggregator: make linter happy
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-21 16:45:28 +02:00
Dr. Stefan Schimanski
b5759ad4f9 aggregator: (pre-)move availability controller
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-21 13:48:50 +02:00
Dr. Stefan Schimanski
c5095069a8 aggregator: separate out status controller metrics
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-21 13:48:49 +02:00
Kubernetes Prow Robot
90a84704d6 Merge pull request #126231 from seans3/websocket-https-proxy-fix
Falls back to SPDY for gorilla/websocket https proxy error
2024-07-20 13:23:16 -07:00
Sean Sullivan
bc52647251 moving for easier cherry-pick 2024-07-20 05:29:57 -07:00
Sean Sullivan
9d560540c5 Falls back to SPDY for gorilla/websocket https proxy error 2024-07-20 00:10:32 -07:00
Kubernetes Prow Robot
8f265b6305 Merge pull request #126136 from cici37/removeFG
Remove feature gate CustomResourceValidationExpressions
2024-07-20 00:08:52 -07:00
Kubernetes Prow Robot
64ba17c605 Merge pull request #125571 from liggitt/filter-auth-02-sar
add field and label selectors to authorization
2024-07-19 15:30:01 -07:00
cici37
95dbfa1c3d Promote metrics for VAP and CRD validation rules to beta. 2024-07-19 22:26:32 +00:00
Jefftree
0898842b3c use context for tests 2024-07-19 20:12:05 +00:00
Kubernetes Prow Robot
fa15f12fb5 Merge pull request #126174 from dobsonj/corruptedmnt-enodev
mount-utils: treat syscall.ENODEV as corrupted mount
2024-07-19 13:08:48 -07:00
Jefftree
a5791b344c Validate CABundle when writing CRD 2024-07-19 19:38:54 +00:00
Vadim Rutkovsky
77e84efe31 featuregate: clone queriedFeatures only when mutation is needed
Avoid allocating memory when cloned set of queried features is not necessary
2024-07-19 21:07:12 +02:00
Jordan Liggitt
9f8f36708a Fixup lint warning 2024-07-19 15:06:52 -04:00
Jordan Liggitt
a1398a8cca Add structured labelSelector / fieldSelector to authorization webhook match conditions 2024-07-19 15:06:50 -04:00
Jordan Liggitt
83bd512861 Adjust CEL cost calculation and versioning for authorization library 2024-07-19 15:06:49 -04:00
David Eads
be2e32fa3e Add CEL fieldSelector / labelSelector support to authorizer library 2024-07-19 15:06:49 -04:00
Jordan Liggitt
03d48b7683 Move CEL env initialization out of package init()
This ensures compatibility version and feature gates can be initialized
before cached CEL environments are created.
2024-07-19 15:06:48 -04:00
Jordan Liggitt
1d2ad282cf Improve CEL cost tests to catch unhandled estimates or types 2024-07-19 15:06:47 -04:00
David Eads
92e3445e9d add field and label selectors to authorization attributes
Co-authored-by: Jordan Liggitt <liggitt@google.com>
2024-07-19 15:06:47 -04:00
David Eads
f5e5bef2e0 generate 2024-07-19 14:35:37 -04:00
David Eads
90f0b88b6a add subjectaccessreview field and label selectors
Co-authored-by: Jordan Liggitt <liggitt@google.com>
2024-07-19 14:34:49 -04:00
Kubernetes Prow Robot
acaec0c23a Merge pull request #126124 from cici37/feature/validating-admission-policy/metrics-improvement
Feature/validating admission policy/metrics improvement
2024-07-19 10:34:58 -07:00
Jonathan Dobson
4cec4e7422 mount-utils: treat syscall.ENODEV as corrupted mount 2024-07-19 08:14:30 -06:00
mprahl
a54ba917be Allow calling Stop multiple times on RetryWatcher
This makes the Stop method idempotent so that if Stop is called multiple
times, it does not cause a panic due to closing a closed channel.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
2024-07-19 08:54:41 -04:00
Kubernetes Prow Robot
77e12aeca9 Merge pull request #126207 from thockin/ingress-backend-port-atomic
Make ServiceBackendPort an atomic struct
2024-07-18 19:24:26 -07:00
Kubernetes Prow Robot
25935965c5 Merge pull request #125782 from aborrero/master
procMount: fix default value documentation
2024-07-18 19:24:11 -07:00
Kubernetes Prow Robot
f2428d66cc Merge pull request #125163 from pohly/dra-kubelet-api-version-independent-no-rest-proxy
DRA: make kubelet independent of the resource.k8s.io API version
2024-07-18 17:47:48 -07:00
Patrick Ohly
7701a48bd6 dra kubelet: bump gRPC API to v1alpha4
The previous changes are an API break, therefore we need a new version.
2024-07-18 23:30:09 +02:00
Kubernetes Prow Robot
d040043edb Merge pull request #124736 from MikeSpreitzer/exempt-borrows-more
More assertive borrowing by exempt
2024-07-18 13:41:38 -07:00
Tim Hockin
7313990f61 Make ServiceBackendPort an atomic struct
This allows different actors to force ownership of it without having to
explicitly unset the other field.
2024-07-18 13:20:33 -07:00
Harshal Patil
fff2b7f566 Kubelet option to disable cgroup v1 support
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2024-07-18 14:00:21 -04:00
Kubernetes Prow Robot
595927da21 Merge pull request #125660 from saschagrunert/oci-volumesource-api
[KEP-4639] Add `ImageVolumeSource` API
2024-07-18 10:39:15 -07:00
Kubernetes Prow Robot
601eb7e9cf Merge pull request #122922 from marosset/windows-memory-eviction
Add support for Windows memory-pressure eviction
2024-07-18 10:39:06 -07:00
Kubernetes Prow Robot
73198f893c Merge pull request #124859 from morlay/master
Remove json:",omitempty" where json:",inline" specified.
2024-07-18 09:33:33 -07:00
Sascha Grunert
f7ca3131e0 Add ImageVolumeSource API
Adding the required Kubernetes API so that the kubelet can start using
it. This patch also adds the corresponding alpha feature gate as
outlined in KEP 4639.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2024-07-18 17:25:54 +02:00
Lukasz Szaszkiewicz
88f47b4b4d Revert "kube-apiserver: promote WatchList feature to beta"
This reverts commit 0b15903b35.
2024-07-18 09:29:24 +02:00
Patrick Ohly
348f94ab55 DRA: read ResourceClaim in DRA drivers
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
2024-07-18 09:09:20 +02:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00