kubernetes

Author	SHA1	Message	Date
dprotaso	610509fedd	Update standard app protocols Add websocket support - see https://github.com/kubernetes/enhancements/pull/3996	2023-07-12 08:28:50 -04:00
Dr. Stefan Schimanski	f1f2fa9da8	kube-apiserver/corerest: split apart generic code	2023-07-12 14:13:10 +02:00
Francesco Romani	01c3a51a78	node: podresources: getallocatable: move to GA lock the feature gate to GA, and remove the now-redundant code. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 14:11:22 +02:00
Patrick Ohly	1b8ddf6b79	podgc controller: convert to contextual logging	2023-07-12 13:45:10 +02:00
TommyStarK	f924bf95df	dynamic resource allocation: Improve code coverage of state checkpoint Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-07-12 13:27:18 +02:00
Francesco Romani	c635a7e7d8	node: devicemgr: topomgr: add logs One of the contributing factors of issues #118559 and #109595 hard to debug and fix is that the devicemanager has very few logs in important flow, so it's unnecessarily hard to reconstruct the state from logs. We add minimal logs to be able to improve troubleshooting. We add minimal logs to be backport-friendly, deferring a more comprehensive review of logging to later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	3bcf4220ec	kubelet: devices: skip allocation for running pods When kubelet initializes, runs admission for pods and possibly allocated requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running). Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working. Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running. Note that if container runtime is down when kubelet restarts, the approach implemented here won't work. In this scenario, so on kubelet restart containers will again fail admission, hitting https://github.com/kubernetes/kubernetes/issues/118559 again. This scenario should however be pretty rare. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Kubernetes Prow Robot	745cfa35bd	Merge pull request #119147 from mengjiao-liu/contextual-logging-controller-disruption Migrate /pkg/controller/disruption to structured and contextual logging	2023-07-12 03:35:25 -07:00
Kubernetes Prow Robot	a8093823c3	Merge pull request #119042 from sttts/sttts-restcore-split cmd/kube-apiserver: turn core (legacy) rest storage into standard RESTStorageProvider	2023-07-12 03:35:17 -07:00
Patrick Ohly	6f1a29520f	scheduler/dra: reduce pod scheduling latency This is a combination of two related enhancements: - By implementing a PreEnqueue check, the initial pod scheduling attempt for a pod with a claim template gets avoided when the claim does not exist yet. - By implementing cluster event checks, only those pods get scheduled for which something changed, and they get scheduled immediately without delay.	2023-07-12 11:17:04 +02:00
Patrick Ohly	e01db32573	scheduler util: handle cache.DeletedFinalStateUnknown in As Informer callbacks must be prepared to get cache.DeletedFinalStateUnknown as the deleted object. They can use that as hint that some information may have been missed, but typically they just retrieve the stored object inside it.	2023-07-12 11:07:59 +02:00
Patrick Ohly	ef48efc736	scheduler dynamicresources: minor logging improvements This makes some complex values a bit more readable.	2023-07-12 11:07:59 +02:00
Kubernetes Prow Robot	5130dad2cf	Merge pull request #118408 from danwinship/local-detector kube-proxy local traffic detector single-vs-dual-stack cleanup	2023-07-11 21:19:11 -07:00
Mengjiao Liu	19869478c1	Migrate /pkg/controller/disruption to structured and contextual logging	2023-07-12 11:30:45 +08:00
Maciej Skrocki	7c873327b6	Convert controller name to reconciler variable.	2023-07-11 18:08:25 +00:00
Maciej Skrocki	29fad383da	move endpointslice reconciler to staging endpointslice repo	2023-07-11 18:08:12 +00:00
Kubernetes Prow Robot	a6890b361d	Merge pull request #119193 from mimowo/sync-job-context Introduce syncJobContext to limit the number of function parameters	2023-07-11 10:33:30 -07:00
Kubernetes Prow Robot	e0dafe57a3	Merge pull request #117351 from pohly/dra-generated-resource-claim-names DRA: generated resource claim names	2023-07-11 10:33:11 -07:00
Dr. Stefan Schimanski	a34e06e74c	kube-apiserver/corerest: structure Config	2023-07-11 17:27:20 +02:00
Dr. Stefan Schimanski	75e3576523	kube-apiserver: rewire service controllers: kubernetesservice + IP repair	2023-07-11 17:27:20 +02:00
PiotrProkop	f855a23b45	topologymanager: promote TopologyManagerPolicyOptions feature to beta * Promote TopologyManagerPolicyOptions feature to beta * Promote PreferClosestNUMANodes TopologyManagerPolicyOption to beta Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:06:57 +02:00
PiotrProkop	23833b9c81	topologymanager: Increase TopologyManager test coverage by adding negative test cases around NUMA topology discovery Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:04:32 +02:00
PiotrProkop	998654e044	topologymanager: fix TopologyManagerPolicyBetaOptions not being enabled by default Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:04:32 +02:00
Patrick Ohly	fec25785ee	dra: store generated ResourceClaims in cache This addresses the following bad sequence of events: - controller creates ResourceClaim - updating pod status fails - pod gets retried before the informer receives the created ResourceClaim - another ResourceClaim gets created Storing the generated ResourceClaim in a MutationCache ensures that the controller knows about it during the retry. A positive side effect is that ResourceClaims now get index by pod owner and thus iterating over existing ones becomes a bit more efficient.	2023-07-11 14:23:49 +02:00
Patrick Ohly	0fc62d5ded	dra: generated files	2023-07-11 14:23:48 +02:00
Patrick Ohly	444d23bd2f	dra: generated name for ResourceClaim from template Generating the name avoids all potential name collisions. It's not clear how much of a problem that was because users can avoid them and the deterministic names for generic ephemeral volumes have not led to reports from users. But using generated names is not too hard either. What makes it relatively easy is that the new pod.status.resourceClaimStatus map stores the generated name for kubelet and node authorizer, i.e. the information in the pod is sufficient to determine the name of the ResourceClaim. The resource claim controller becomes a bit more complex and now needs permission to modify the pod status. The new failure scenario of "ResourceClaim created, updating pod status fails" is handled with the help of a new special "resource.kubernetes.io/pod-claim-name" annotation that together with the owner reference identifies exactly for what a ResourceClaim was generated, so updating the pod status can be retried for existing ResourceClaims. The transition from deterministic names is handled with a special case for that recovery code path: a ResourceClaim with no annotation and a name that follows the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod claim and gets added to the pod status. There's no immediate need for it, but just in case that it may become relevant, the name of the generated ResourceClaim may also be left unset to record that no claim was needed. Components processing such a pod can skip whatever they normally would do for the claim. To ensure that they do and also cover other cases properly ("no known field is set", "must check ownership"), resourceclaim.Name gets extended.	2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot	bc01306c98	Merge pull request #116738 from AxeZhan/TopologyManagerPolicy When TopologyManagerPolicy is None, skip checks in NewManager.	2023-07-11 04:53:13 -07:00
Kubernetes Prow Robot	8f1852bb44	Merge pull request #115295 from Namanl2001/pkg/controller/endpointslice Migrated `pkg/controller/endpointslice` and `pkg/controller/endpointslicemirroring` to contextual logging	2023-07-11 03:19:12 -07:00
Evan Lezar	cd14e97ea8	Add a builder for ContainerAllocateResponse objects This chagne introduces a helper to construct ContainerAllocateResponse instances. Test cases are updated to use a new constructor accepting functional options allowing the response contents to be set based on the test requirements. This can then be extended to also test additional fields in the device plugin API such as annotations which are not currently covered or new fields. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:48:26 +02:00
Evan Lezar	db2a1edbdd	Generate empty cdi annotations Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:48:24 +02:00
Evan Lezar	f0e3c32fe5	Move CDI annotation code to utils package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:47:53 +02:00
Jan Safranek	354b6c409f	Rename updateReconstructedFromAPIServer to be in sync with volumesNeedUpdateFromNodeStatus.	2023-07-11 11:25:43 +02:00
Jan Safranek	1903f5aa2a	Rename volumesNeedDevicePath To volumesNeedUpdateFromNodeStatus - because both devicePath and uncertain attach-ability needs to be fixed from node status.	2023-07-11 11:15:24 +02:00
Jan Safranek	7cd60df4aa	Update volumesInUse after attachability is confirmed node.status.volumesInUse should report only attachable volumes, therefore it needs to wait for the reconciler to update uncertain attachability of volumes from the API server.	2023-07-11 10:32:22 +02:00
Jan Safranek	0a2272dc68	Add uncertain state of volume attach-ability During CSI volume reconstruction it's not possible to tell, if the volume is attachable or not - CSIDriver instance may not be available, because kubelet may not have connection to the API server at that time. Adding uncertain state during reconstruction + adding a correct state when the API server is available.	2023-07-11 10:32:22 +02:00
Michal Wozniak	bf48165232	Remarks to syncJobCtx	2023-07-11 09:44:08 +02:00
Michal Wozniak	990339d4c3	Introduce syncJobContext to limit the number of function parameters	2023-07-11 09:27:21 +02:00
carlory	f443c458af	move non-graceful node shutdown to GA	2023-07-11 13:51:51 +08:00
Kubernetes Prow Robot	986171d388	Merge pull request #119185 from xing-yang/metrics_attach Add reason to force detach metric	2023-07-10 14:03:18 -07:00
Naman	645cb90732	migrated pkg/controller/endpointslicemirroring to contextual logging Signed-off-by: Naman <namanlakhwani@gmail.com>	2023-07-11 01:43:30 +05:30
Daniel Vega-Myhre	98c6e25c37	update name of pod index label	2023-07-10 20:11:52 +00:00
Naman	09849b09cf	migrated pkg/controller/endpointslice to contextual logging Signed-off-by: Naman <namanlakhwani@gmail.com>	2023-07-11 01:28:22 +05:30
Kubernetes Prow Robot	c95b16b280	Merge pull request #118608 from utam0k/podtopologyspread-prescore-skip Return Skip in PodTopologySpread#PreScore under specific conditions	2023-07-10 09:27:07 -07:00
Kubernetes Prow Robot	10a12165de	Merge pull request #116755 from my-git9/feat/endpoint/logging Migrated `pkg/controller/endpoint` to contextual logging	2023-07-10 05:37:05 -07:00
twelcon	70f979c8da	Alert message improved according to standards Signed-off-by: twelcon <mastermind12210@gmail.com>	2023-07-10 17:13:35 +05:30
Kubernetes Prow Robot	64939b66c6	Merge pull request #119146 from xuexu6666/xuexu6666/ControllerUtilUseCmpDiff Use cmp diff in controller_util_test.go	2023-07-10 02:41:18 -07:00
Sascha Grunert	a6554b9d5d	Make kubelet label types public We use the label definitions in CRI-O, means we now make them public to stop vendoring/copying this part of Kubernetes. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-07-10 10:58:44 +02:00
Alexander Constantinescu	08dd657a71	Implement metrics agreed on the KEP	2023-07-10 10:32:02 +02:00
Alexander Constantinescu	9b1c4c7b57	Implement KEP-3836 TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler. This field might need refinement, but currently is deemed our best way of understanding if a node is about to get deleted. We want to do this only for eTP:Cluster services. The goal is to connection draining terminating nodes	2023-07-10 10:30:54 +02:00
xing-yang	cca6601106	Add reason to force detach metric	2023-07-10 06:30:05 +00:00

... 37 38 39 40 41 ...

49698 Commits