Commit Graph

50438 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
f2428d66cc Merge pull request #125163 from pohly/dra-kubelet-api-version-independent-no-rest-proxy
DRA: make kubelet independent of the resource.k8s.io API version
2024-07-18 17:47:48 -07:00
Kubernetes Prow Robot
5fc7032a0e Merge pull request #126156 from pohly/kubelet-test-enhancements
kubelet test enhancements
2024-07-18 14:50:54 -07:00
Kubernetes Prow Robot
fa7fcde5a4 Merge pull request #125813 from aojea/node_csr_ips
Node Request Certificates require to have IPs
2024-07-18 14:50:48 -07:00
Patrick Ohly
7701a48bd6 dra kubelet: bump gRPC API to v1alpha4
The previous changes are an API break, therefore we need a new version.
2024-07-18 23:30:09 +02:00
Monis Khan
6a6771b514 svm: set UID and RV on SSA patch to cause conflict on logical create
When a resource gets deleted during migration, the SVM SSA patch
calls are interpreted as a logical create request.  Since the object
from storage is nil, the merged result is just a type meta object,
which lacks a name in the body.  This fails when the API server
checks that the name from the request URL and the body are the same.
Note that a create request is something that SVM controller should
never do.

Once the UID is set on the patch, the API server will fail the
request at a slightly earlier point with an "uid mismatch" conflict
error, which the SVM controller can handle gracefully.

Setting UID by itself is not sufficient.  When a resource gets
deleted and recreated, if RV is not set but UID is set, we would get
an immutable field validation error for attempting to update the
UID.  To address this, we set the resource version on the SSA patch
as well.  This will cause that update request to also fail with a
conflict error.

Added the create verb on all resources for SVM controller RBAC as
otherwise the API server will reject the request before it fails
with a conflict error.

The change addresses a host of other issues with the SVM controller:

1. Include failure message in SVM resource
2. Do not block forever on unsynced GC monitor
3. Do not immediately fail on GC monitor being missing, allow for
   a grace period since discovery may be out of sync
4. Set higher QPS and burst to handle large migrations

Test changes:

1. Clean up CRD webhook convertor logs
2. Allow SVM tests to be run multiple times to make finding flakes easier
3. Create and delete CRs during CRD test to force out any flakes
4. Add a stress test with multiple parallel migrations
5. Enable RBAC on KAS
6. Run KCM directly to exercise wiring and RBAC
7. Better logs during CRD migration
8. Scan audit logs to confirm SVM controller never creates

Signed-off-by: Monis Khan <mok@microsoft.com>
2024-07-18 17:19:11 -04:00
Tim Hockin
7313990f61 Make ServiceBackendPort an atomic struct
This allows different actors to force ownership of it without having to
explicitly unset the other field.
2024-07-18 13:20:33 -07:00
Harshal Patil
fff2b7f566 Kubelet option to disable cgroup v1 support
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2024-07-18 14:00:21 -04:00
Kubernetes Prow Robot
595927da21 Merge pull request #125660 from saschagrunert/oci-volumesource-api
[KEP-4639] Add `ImageVolumeSource` API
2024-07-18 10:39:15 -07:00
Kubernetes Prow Robot
601eb7e9cf Merge pull request #122922 from marosset/windows-memory-eviction
Add support for Windows memory-pressure eviction
2024-07-18 10:39:06 -07:00
Sascha Grunert
f7ca3131e0 Add ImageVolumeSource API
Adding the required Kubernetes API so that the kubelet can start using
it. This patch also adds the corresponding alpha feature gate as
outlined in KEP 4639.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2024-07-18 17:25:54 +02:00
Dan Winship
30bc1b59d7 Add unit tests to validate "bad IP/CIDR" handling in kube-proxy
Also, fix the handling of bad EndpointSlice IPs!
2024-07-18 10:55:13 -04:00
Dan Winship
f762e5c8de Remove an unnecessary comment in nftables output
(It's redundant with the chain name.)
2024-07-18 10:54:30 -04:00
Dan Winship
11f55eae96 Reduce some duplication in nftables unit tests 2024-07-18 10:53:36 -04:00
Kubernetes Prow Robot
dda657b598 Merge pull request #126191 from p0lyn0mial/upstream-revert-promote-watch-list-to-beta
Revert "Promote WatchList feature to Beta"
2024-07-18 07:39:28 -07:00
Kubernetes Prow Robot
eb58e5e002 Merge pull request #125976 from vrutkovs/apf-typemeta-print-type
flowcontrol: print object type when bootstrapping flowschemas
2024-07-18 07:39:19 -07:00
Kubernetes Prow Robot
7693a7e71a Merge pull request #126190 from mimowo/job-controller-cleanup
Cleanup Job controller isPodFailed function
2024-07-18 02:44:53 -07:00
Antonio Ojea
bc63c412b9 kubelet request certificates if at least one IP exist
A Kubernetes Node requires to have at minimum one IP address
because those are used on the Pods field HostIPs and in some cases,
when pods uses hostNetwork: true, as PodIPs.
Nodes that use IP addresses as Hostname are interpreted as an IP
address, so it is possible that are nodes that don't hane any DNSname.

The feature gate AllowDNSOnlyNodeCSR will allow user to opt-in for
the old behavior.

Change-Id: I094531d87246f1e7a5ef4fe57bd5d9840cb1375d
2024-07-18 09:44:48 +00:00
Kensei Nakada
9ff3227b15 add: implement event_handling_duration_seconds metric 2024-07-18 18:16:57 +09:00
Kubernetes Prow Robot
24fbb13eaf Merge pull request #126113 from googs1025/enqueueExtensions_refactor
scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister()
2024-07-18 00:53:25 -07:00
Kubernetes Prow Robot
9196650533 Merge pull request #123819 from fakecore/fc/master
fix: handle socket file detection on Windows
2024-07-18 00:53:16 -07:00
Lukasz Szaszkiewicz
88f47b4b4d Revert "kube-apiserver: promote WatchList feature to beta"
This reverts commit 0b15903b35.
2024-07-18 09:29:24 +02:00
Patrick Ohly
348f94ab55 DRA: read ResourceClaim in DRA drivers
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
2024-07-18 09:09:20 +02:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00
Michal Wozniak
1be4df6e02 Cleanup Job controller isPodFailed function 2024-07-18 09:08:23 +02:00
googs1025
a3978e8315 scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister() 2024-07-18 12:22:17 +08:00
carlory
dae05f3b88 cleanup after JobPodFailurePolicy is promoted to GA 2024-07-18 10:00:56 +08:00
Mark Rossetti
0411a3d565 Add support for memory pressure evictiong on Windows
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2024-07-17 15:11:30 -07:00
Kubernetes Prow Robot
5d40866fae Merge pull request #125994 from carlory/fix-job-api
clean up codes after PodDisruptionConditions was promoted to GA
2024-07-17 14:37:09 -07:00
Kubernetes Prow Robot
b23f41e192 Merge pull request #125940 from thockin/master
Clarify errors in ProjectedVolume validation
2024-07-17 13:09:51 -07:00
Kubernetes Prow Robot
d879103c28 Merge pull request #125820 from macsko/add_separate_lock_for_pod_nominator_scheduling_queue
Add a separate lock for pod nominator in scheduling queue
2024-07-17 12:06:10 -07:00
Kubernetes Prow Robot
c3bcd4fff0 Merge pull request #126139 from enj/enj/i/revert_list_cache
Revert "Move ConsistentListFromCache to Beta default"
2024-07-17 09:59:14 -07:00
Kubernetes Prow Robot
a8110d7174 Merge pull request #125259 from sohankunkerkar/bump-proc-mount-beta
[KEP-4265] promote ProcMountType feature gate to beta
2024-07-17 09:59:07 -07:00
Patrick Ohly
6604ff94d8 kubelet: enhance podresources tests
The manual deep comparison code is hard to maintain (would need to be updated
in https://github.com/kubernetes/kubernetes/pull/125488) and error prone.

In fact, one test case failed when doing a full automatic comparison with
cmp.Diff because it wasn't setting allMemory.
2024-07-17 17:50:10 +02:00
Patrick Ohly
b9d00841a6 kubelet: improve checkpoint errors
Recording the expected and actual checksum in the error makes it possible to
provide that information, for example in a failed test like the ones for DRA.
Otherwise developers have to manually step through the test with a debugger to
figure out what the new checksum is.
2024-07-17 16:07:31 +02:00
Maciej Skoczeń
5def93b10a Add a separate lock for pod nominator in scheduling queue 2024-07-17 07:58:59 +00:00
bells17
4c3c4128af volumebinding: scheduler queueing hints - StorageClass 2024-07-17 15:03:17 +09:00
Kubernetes Prow Robot
9247a21be6 Merge pull request #124959 from bells17/qhint-volume-binding-pvc
volumebinding: scheduler queueing hints - PersistentVolumeClaim
2024-07-16 21:43:06 -07:00
Monis Khan
aeb51a16e3 Revert "Move ConsistentListFromCache to Beta default"
This reverts commit 0c0e19b343.

During stress test for SVM controller, the controller is unable to
make a list call due to following error:

resourceversion.go:155: I0716 21:49:26.973127] storage-version-migrator-controller: Error syncing SVM resource, retrying svm="crdsvm" err="error getting latest resourceVersion for stable.example.com/v1, Resource=testcrds: Timeout: Too large resource version: 28976, current: 20349"

With the feature disabled, the stress test passes.

Signed-off-by: Monis Khan <mok@microsoft.com>
2024-07-16 23:12:16 -04:00
Kubernetes Prow Robot
8aff9d3192 Merge pull request #126072 from aroradaman/proxy-config-v1alpah2-windows
kube-proxy: internal config: add Linux and Windows section
2024-07-16 19:37:12 -07:00
Kubernetes Prow Robot
52c0ed4673 Merge pull request #124342 from zhifei92/fix-error-check
fix error checking in kl.killPod within SyncPod
2024-07-16 16:05:07 -07:00
Kubernetes Prow Robot
fc3abdaf2d Merge pull request #125470 from everpeace/kep-3619-SupplementalGroupsPolicy-e2e
KEP-3619: Add NodeStatus.Features.SupplementalGroupsPolicy API and e2e
2024-07-16 13:57:06 -07:00
Hemant Kumar
b59c3c5d3d Preserve conditions in case we are retrying expansion in some cases
When marking pvc expansion for failed condition, we should try and
preserve old resizing conditions with different name.
2024-07-16 15:44:08 -04:00
cici37
6a12b87525 Auto updates 2024-07-16 18:56:49 +00:00
Cici Huang
67a171a142 Remove feature gate CustomResourceValidationExpressions. 2024-07-16 10:39:00 -07:00
Kubernetes Prow Robot
9c763a9c9f Merge pull request #126104 from cji/5321
Add funcs in pkg/filesystem/util to set file permissions on Windows and update container log dir perms
2024-07-16 10:33:05 -07:00
Kubernetes Prow Robot
130414950f Merge pull request #124568 from xyz-li/fix_apiserver_output
api: fix ValidatingAdmissionPolicyList json tag
2024-07-16 09:20:48 -07:00
Kubernetes Prow Robot
64528d865e Merge pull request #124268 from SataQiu/fix-20240411
kubelet: adjust the validation logic to treat [none] as the EnforceNodeAllocatable is disabled
2024-07-16 09:20:39 -07:00
Hemant Kumar
b3db0ba04c Fix error about missing volumeSpec for expansion during mount 2024-07-16 12:07:46 -04:00
Peter Schuurman
585971431b Remove StatefulSetStartOrdinal feature gate to target stable in 1.31 2024-07-16 08:05:09 -07:00
Hemant Kumar
099cb71a53 Ensure that all options are correctly set when calling node-expand-during-mount 2024-07-16 10:04:19 -04:00