Commit Graph

49999 Commits

Author SHA1 Message Date
Patrick Ohly
4bddebc48e DRA: fix scheduler/resource claim controller race with retry
The JSON patch approach works, but it is complex. A retry loop is easier to
understand (detect conflict, get new claim, try again). There is one additional
API call (the get), but in practice this scenario is unlikely.
2024-06-27 15:03:56 +02:00
Patrick Ohly
ecbafb8de5 DRA: fix scheduler/resource claim controller race
There was a race caused by having to update claim finalizer and status in two
different operations:
- Resource claim controller removes allocation, does not yet
  get to remove the finalizer.
- Scheduler prepares an allocation, without adding the finalizer
  because it's there.
- Controller removes finalizer.
- Scheduler adds allocation.

This is an invalid state. Automatic checking found this during the execution of
the "with translated parameters on single node.*supports sharing a claim
sequentially" E2E test, but only when run stand-alone. When running in
parallel (as in the CI), the bad outcome of the race did not occur.

The fix is to check that the finalizer is still set when adding the
allocation. The apiserver doesn't check that because it doesn't know which
finalizer goes with the allocation result. It could check for "some finalizer",
but that is not guaranteed to be correct (could be some unrelated one).

Checking the finalizer can only be done with a JSON patch. Despite the
complications, having the ability to add multiple pods concurrently to
ReservedFor seems worth it (avoids expensive rescheduling or a local retry
loop).

The resource claim controller doesn't need this, it can do a normal update
which implicitly checks ResourceVersion.
2024-06-27 15:03:06 +02:00
Kubernetes Prow Robot
92e0db2bbf Merge pull request #125640 from googs1025/resourceclaim_controller_log_fix1
added resourceclaim_controller log info
2024-06-27 03:20:10 -07:00
Kubernetes Prow Robot
cd19796316 Merge pull request #125475 from AkihiroSuda/rro
KEP-3857: promote RecursiveReadOnlyMounts feature to beta
2024-06-26 14:13:39 -07:00
Kubernetes Prow Robot
1d51766c7a Merge pull request #125698 from pohly/dra-log-output
DRA: log output
2024-06-26 12:01:03 -07:00
Kubernetes Prow Robot
e57f8ad80b Merge pull request #125439 from Octopusjust/k8s-pr22
pkg/printers: drop deprecated pointer package
2024-06-26 10:58:48 -07:00
Kubernetes Prow Robot
44c1a0eec2 Merge pull request #124667 from linxiulei/trim
controlplane/apiserver: Trim managedFields off self-requested informers
2024-06-26 08:10:20 -07:00
googs1025
5f8fb17652 added resourceclaim_controller log info
Signed-off-by: googs1025 <googs1025@gmail.com>
2024-06-26 18:38:11 +08:00
Kubernetes Prow Robot
084d6c4968 Merge pull request #125699 from pohly/scheduler-framework-logging
scheduler: fix klog.KObjSlice when applied to []*NodeInfo
2024-06-26 01:50:23 -07:00
Kubernetes Prow Robot
01f9712c6f Merge pull request #125419 from benluddy/cbor-byteslice-base64
KEP-4222: Enable JSON-compatible base64 encoding of []byte for CBOR.
2024-06-26 00:02:22 -07:00
Patrick Ohly
719a49cc13 scheduler: fix klog.KObjSlice when applied to []*NodeInfo
The DRA plugin does that. It didn't actually work and only printed an error
message about NodeInfo not implementing klog.KMetata. That's not a compile-time
check due to limitations with Go generics and had been missed earlier.
2024-06-26 08:11:31 +02:00
Kubernetes Prow Robot
fb0195df11 Merge pull request #123428 from atiratree/UnhealthyPodEvictionPolicy-GA
promote PDBUnhealthyPodEvictionPolicy to GA
2024-06-25 21:56:20 -07:00
Ben Luddy
38f87df0e3 Enable JSON-compatible base64 encoding of []byte for CBOR.
The encoding/json package marshals []byte to a JSON string containing the base64 encoding of the
input slice's bytes, and unmarshals JSON strings to []byte by assuming the JSON string contains a
valid base64 text.

As a binary format, CBOR is capable of representing arbitrary byte sequences without converting them
to a text encoding, but it also needs to interoperate with the existing JSON serializer. It does
this using the "expected later encoding" tags defined in RFC 8949, which indicate a specific text
encoding to be used when interoperating with text-based protocols. The actual conversion to or from
a text encoding is deferred until necessary, so no conversion is performed during roundtrips of
[]byte to CBOR.
2024-06-25 21:05:26 -04:00
Siyuan Zhang
379676c4be add DefaultComponentGlobalsRegistry flags in ServerRunOptions
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-06-25 22:12:11 +00:00
Siyuan Zhang
4352c4ad27 Add version mapping in ComponentGlobalsRegistry.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-06-25 22:12:11 +00:00
Siyuan Zhang
701e5fc374 Add composition flags for emulation version and feature gate.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-06-25 22:12:11 +00:00
Siyuan Zhang
403301bfdf apiserver: Add API emulation versioning.
Co-authored-by: Siyuan Zhang <sizhang@google.com>
Co-authored-by: Joe Betz <jpbetz@google.com>
Co-authored-by: Alex Zielenski <zielenski@google.com>

Signed-off-by: Siyuan Zhang <sizhang@google.com>
2024-06-25 22:12:11 +00:00
Kubernetes Prow Robot
d0579b6f9c Merge pull request #125683 from likakuli/fix-benchmarkupdatesnapshot
clean: add nodeinfo to cache
2024-06-25 14:18:39 -07:00
Kubernetes Prow Robot
59673f0f37 Merge pull request #125578 from nayihz/fix_sche_queue_update
skip update pod that exist in scheduling cycle
2024-06-25 14:18:19 -07:00
Kubernetes Prow Robot
8c478a06d8 Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers
DRA: scheduler event handlers via assume cache
2024-06-25 11:56:28 -07:00
Kubernetes Prow Robot
9d9b6fb876 Merge pull request #125261 from bart0sh/PR145-DevicePluginCDIDevices-update-GA-milestone
features: update milestone for DevicePluginCDIDevices
2024-06-25 08:25:59 -07:00
Patrick Ohly
2da9e660e3 resourceclaim controller: add missing log output
The logging was fairly complete about *not* doing something, but the actual
ResourceClaim creation was not logged.
2024-06-25 16:12:31 +02:00
likakuli
ea6ca270b5 clean: add nodeinfo to cache
Signed-off-by: likakuli <1154584512@qq.com>
2024-06-25 21:29:05 +08:00
Patrick Ohly
1b63639d31 DRA scheduler: use assume cache to list claims
This finishes the transition to the assume cache as source of truth for the
current set of claims.

The tests have to be adapted. It's not enough anymore to directly put objects
into the informer store because that doesn't change the assume cache
content. Instead, normal Create/Update calls and waiting for the cache update
are needed.
2024-06-25 14:00:25 +02:00
Patrick Ohly
9a6f3b9388 scheduler: central ResourceClaim assume cache
This enables connecting the event handler for ResourceClaim to the assume
cache, which addresses a theoretic race condition.

It may also be useful for implementing the autoscaler support, because now
the autoscaler can modify the content of the cache.
2024-06-25 14:00:25 +02:00
Patrick Ohly
dea16757ef scheduler: AddEventHandler for assume cache
This enables using the assume cache for cluster events.
2024-06-25 14:00:25 +02:00
Patrick Ohly
639f86915b scheduler: add FIFO queue
This is a basic implementation of a first-in-first-out queue with unbounded
size. It's useful for cases where a channel with fixed size might deadlock.

The caller is responsible for locking.
2024-06-25 13:56:15 +02:00
Kubernetes Prow Robot
ba19ecb8c9 Merge pull request #123298 from henry118/spell
Fix func name typo
2024-06-24 21:01:40 -07:00
Kubernetes Prow Robot
0d17892740 Merge pull request #123184 from googs1025/delete_repeat_comments
typo: delete useless comments
2024-06-24 16:48:32 -07:00
Kubernetes Prow Robot
1236f48109 Merge pull request #124770 from uucloud/docs/fix_tls_bootstrapping_link
docs:fix broken link
2024-06-24 13:36:34 -07:00
nayihz
26dcab1146 skip update pod that exist in scheduling cycle 2024-06-24 17:11:09 +08:00
Kubernetes Prow Robot
fb6bbc9781 Merge pull request #125359 from yangjunmyfm192085/fixendpointslicemirroring
fix endpointslicemirroring controller not create endpointslice when the endpoints are recreate
2024-06-23 18:22:55 -07:00
杨军10092085
811bd53ee7 fix endpointslicemirroring controller not create endpointslice when the endpoints are recreate 2024-06-22 10:05:03 +08:00
Kubernetes Prow Robot
7c780186d7 Merge pull request #125473 from liggitt/serviceaccount-cleanup
Clean up service account options completion and fallback
2024-06-21 17:50:55 -07:00
Kubernetes Prow Robot
8c508c5480 Merge pull request #125527 from sanposhiho/gated-pods-filter-out-bug
fix: skip isPodWorthRequeuing only when SchedulingGates gates the pod
2024-06-21 12:22:55 -07:00
Kubernetes Prow Robot
50f27d9ef4 Merge pull request #125613 from mimowo/job-controller-cleanup-tests
Assert on all conditions in the Job unit tests
2024-06-21 11:20:55 -07:00
Kubernetes Prow Robot
bdbd87be2b Merge pull request #125596 from skitt/stretchr-testify-mock
Switch to stretchr/testify / mockery for mocks
2024-06-21 09:23:02 -07:00
Filip Křepinský
68d34580e0 promote PDBUnhealthyPodEvictionPolicy to GA 2024-06-21 16:13:53 +02:00
Michal Wozniak
7b5d3f5bc1 Assert on all conditions in the Pod Failure policy tests 2024-06-21 15:42:14 +02:00
Kensei Nakada
98a3182398 correct comment 2024-06-20 23:48:42 +00:00
Kensei Nakada
2304806cbe elaborate comment more 2024-06-20 23:43:41 +00:00
Kensei Nakada
fa8da84835 remove fixme comment 2024-06-20 23:36:25 +00:00
Kensei Nakada
2c4dc6b65b elaborate comments 2024-06-20 23:36:05 +00:00
Dejan Zele Pejchev
11b6e4c404 count ready pods when deleting active pods for failed jobs 2024-06-21 01:07:40 +02:00
Kubernetes Prow Robot
cc2946e5d1 Merge pull request #125515 from mimowo/refactor-terminating-counter
Refactor tracking of terminating pods in Job controller
2024-06-20 13:01:41 -07:00
Stephen Kitt
3f36c83c68 Switch to stretchr/testify / mockery for mocks
testify is used throughout the codebase; this switches mocks from
gomock to testify with the help of mockery for code generation.

Handlers and mocks in test/utils/oidc are moved to a new package:
mockery operates package by package, and requires packages to build
correctly; test/utils/oidc/testserver.go relies on the mocks and fails
to build when they are removed. Moving the interface and mocks to a
different package allows mockery to process that package without
having to build testserver.go.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-06-20 19:42:53 +02:00
Kubernetes Prow Robot
3ee4d98364 Merge pull request #125576 from alvaroaleman/fix
Corev1.Node: Link to node doc and not PV doc in status.capacity
2024-06-20 10:19:49 -07:00
Kubernetes Prow Robot
a4b8edd0f7 Merge pull request #125104 from enj/enj/i/sa_generics
serviceaccount: use generics to remove runtime type checks during validation
2024-06-20 09:16:49 -07:00
Monis Khan
3da48466d6 serviceaccount: use generics to remove runtime type checks during validation
Signed-off-by: Monis Khan <mok@microsoft.com>
2024-06-20 11:16:15 -04:00
Kubernetes Prow Robot
bb95d084a2 Merge pull request #125603 from mimowo/refactor-enact-finished-job
Refactor enactJobFinished util function for Job controller
2024-06-20 05:04:17 -07:00