kubernetes

Author	SHA1	Message	Date
Shiming Zhang	e6bdd224c1	Add HostIPs for kubelet	2023-07-14 09:35:30 +08:00
Kubernetes Prow Robot	d37c62dcbf	Merge pull request #117800 from cyclinder/loggin_format Add '--logging-format' flag to kube-proxy	2023-07-13 08:40:37 -07:00
cyclinder	c550c17f7f	accept int or string flush frequency	2023-07-13 14:33:33 +08:00
Kubernetes Prow Robot	70370d0210	Merge pull request #117731 from jongwooo/refactor/use-early-return-pattern Use early return pattern to avoid nested conditions	2023-07-12 17:59:41 -07:00
Kubernetes Prow Robot	0086712926	Merge pull request #116922 from sourcelliu/checkpoint Improve the performance of map usage	2023-07-12 17:59:30 -07:00
Kubernetes Prow Robot	047d040ce7	Merge pull request #119012 from pohly/dra-batch-node-prepare kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API	2023-07-12 10:57:37 -07:00
Kubernetes Prow Robot	ac07b4612e	Merge pull request #117804 from jsafrane/fix-csi-attachable-reconstruction Fix reconstruction of CSI volumes	2023-07-12 10:57:15 -07:00
Kubernetes Prow Robot	be222f38f0	Merge pull request #119058 from TommyStarK/dra-state-checkpoint-unit-test dynamic resource allocation: Improve code coverage of state checkpoint	2023-07-12 07:49:14 -07:00
Patrick Ohly	d743c50bb9	kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API Combining all prepare/unprepare operations for a pod enables plugins to optimize the execution. Plugins can continue to use the v1beta2 API for now, but should switch. The new API is designed so that plugins which want to work on each claim one-by-one can do so and then report errors for each claim separately, i.e. partial success is supported.	2023-07-12 14:50:30 +02:00
Francesco Romani	01c3a51a78	node: podresources: getallocatable: move to GA lock the feature gate to GA, and remove the now-redundant code. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 14:11:22 +02:00
TommyStarK	f924bf95df	dynamic resource allocation: Improve code coverage of state checkpoint Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-07-12 13:27:18 +02:00
Francesco Romani	c635a7e7d8	node: devicemgr: topomgr: add logs One of the contributing factors of issues #118559 and #109595 hard to debug and fix is that the devicemanager has very few logs in important flow, so it's unnecessarily hard to reconstruct the state from logs. We add minimal logs to be able to improve troubleshooting. We add minimal logs to be backport-friendly, deferring a more comprehensive review of logging to later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	3bcf4220ec	kubelet: devices: skip allocation for running pods When kubelet initializes, runs admission for pods and possibly allocated requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running). Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working. Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running. Note that if container runtime is down when kubelet restarts, the approach implemented here won't work. In this scenario, so on kubelet restart containers will again fail admission, hitting https://github.com/kubernetes/kubernetes/issues/118559 again. This scenario should however be pretty rare. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Kubernetes Prow Robot	e0dafe57a3	Merge pull request #117351 from pohly/dra-generated-resource-claim-names DRA: generated resource claim names	2023-07-11 10:33:11 -07:00
PiotrProkop	f855a23b45	topologymanager: promote TopologyManagerPolicyOptions feature to beta * Promote TopologyManagerPolicyOptions feature to beta * Promote PreferClosestNUMANodes TopologyManagerPolicyOption to beta Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:06:57 +02:00
PiotrProkop	23833b9c81	topologymanager: Increase TopologyManager test coverage by adding negative test cases around NUMA topology discovery Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-07-11 15:04:32 +02:00
Patrick Ohly	444d23bd2f	dra: generated name for ResourceClaim from template Generating the name avoids all potential name collisions. It's not clear how much of a problem that was because users can avoid them and the deterministic names for generic ephemeral volumes have not led to reports from users. But using generated names is not too hard either. What makes it relatively easy is that the new pod.status.resourceClaimStatus map stores the generated name for kubelet and node authorizer, i.e. the information in the pod is sufficient to determine the name of the ResourceClaim. The resource claim controller becomes a bit more complex and now needs permission to modify the pod status. The new failure scenario of "ResourceClaim created, updating pod status fails" is handled with the help of a new special "resource.kubernetes.io/pod-claim-name" annotation that together with the owner reference identifies exactly for what a ResourceClaim was generated, so updating the pod status can be retried for existing ResourceClaims. The transition from deterministic names is handled with a special case for that recovery code path: a ResourceClaim with no annotation and a name that follows the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod claim and gets added to the pod status. There's no immediate need for it, but just in case that it may become relevant, the name of the generated ResourceClaim may also be left unset to record that no claim was needed. Components processing such a pod can skip whatever they normally would do for the claim. To ensure that they do and also cover other cases properly ("no known field is set", "must check ownership"), resourceclaim.Name gets extended.	2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot	bc01306c98	Merge pull request #116738 from AxeZhan/TopologyManagerPolicy When TopologyManagerPolicy is None, skip checks in NewManager.	2023-07-11 04:53:13 -07:00
Evan Lezar	cd14e97ea8	Add a builder for ContainerAllocateResponse objects This chagne introduces a helper to construct ContainerAllocateResponse instances. Test cases are updated to use a new constructor accepting functional options allowing the response contents to be set based on the test requirements. This can then be extended to also test additional fields in the device plugin API such as annotations which are not currently covered or new fields. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:48:26 +02:00
Evan Lezar	db2a1edbdd	Generate empty cdi annotations Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:48:24 +02:00
Evan Lezar	f0e3c32fe5	Move CDI annotation code to utils package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:47:53 +02:00
Jan Safranek	354b6c409f	Rename updateReconstructedFromAPIServer to be in sync with volumesNeedUpdateFromNodeStatus.	2023-07-11 11:25:43 +02:00
Jan Safranek	1903f5aa2a	Rename volumesNeedDevicePath To volumesNeedUpdateFromNodeStatus - because both devicePath and uncertain attach-ability needs to be fixed from node status.	2023-07-11 11:15:24 +02:00
Jan Safranek	7cd60df4aa	Update volumesInUse after attachability is confirmed node.status.volumesInUse should report only attachable volumes, therefore it needs to wait for the reconciler to update uncertain attachability of volumes from the API server.	2023-07-11 10:32:22 +02:00
Jan Safranek	0a2272dc68	Add uncertain state of volume attach-ability During CSI volume reconstruction it's not possible to tell, if the volume is attachable or not - CSIDriver instance may not be available, because kubelet may not have connection to the API server at that time. Adding uncertain state during reconstruction + adding a correct state when the API server is available.	2023-07-11 10:32:22 +02:00
Sascha Grunert	a6554b9d5d	Make kubelet label types public We use the label definitions in CRI-O, means we now make them public to stop vendoring/copying this part of Kubernetes. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-07-10 10:58:44 +02:00
Kubernetes Prow Robot	1e0b4c84cf	Merge pull request #116879 from lzhecheng/fix-generateAPIPodStatus-dualstack [Dual-stack] Fix generateAPIPodStatus() of kubelet handling Secondary IP	2023-07-07 20:37:04 -07:00
Todd Neal	ea1eb7f8f7	implement sidecar resource calculation	2023-07-08 07:26:13 +09:00
Gunju Kim	b94fa250c2	Sidecar: Implement lifecycle of the restartable init container - Implement `computeInitContainerActions` to sets the actions for the init containers, including those with `RestartPolicyAlways`. - Allow StartupProbe on the restartable init containers. - Update PodPhase considering the restartable init containers. - Update PodInitialized status and status manager considering the restartable init containers. Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com>	2023-07-08 07:26:12 +09:00
Kubernetes Prow Robot	7581ae8123	Merge pull request #116739 from moshe010/clone-cdi-devices kubelet dra: lock before getting claimInfo CDIDevices and annotations fields	2023-07-07 06:31:04 -07:00
Sascha Grunert	20a25cbfcf	Add user specified image to CRI `ContainerConfig` The container config image references either an image ID or a digest, but not the original image from the container config. We require the image for signature verification to ensure that we actually verify the correct image. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-07-06 08:40:09 +02:00
Zhecheng Li	985cf718a4	[Dual-stack] Fix generateAPIPodStatus() of kubelet handling Secondary IP hostIPs order may not be be consistent. If secondary IP is before primary one, current logic adds primary IP twice into PodIPs, which leads to error: "may specify no more than one IP for each IP family". In this case, the second IP shouldn't be added. Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>	2023-07-03 06:47:15 +00:00
Kubernetes Prow Robot	01171e8250	Merge pull request #118976 from ctripcloud/fix-typo fix kubelet podWorkers.SyncKnownPods() typo	2023-07-01 06:06:48 -07:00
Kubernetes Prow Robot	c2b7d25ff8	Merge pull request #118691 from giuseppe/drop-check-for-volumes apis: drop check for volumes with user namespaces	2023-06-29 16:23:56 -07:00
zach593	f7cf9effa3	fix kubelet podWorkers.SyncKnownPods() typo Signed-off-by: zach593 <montenukem@outlook.com>	2023-06-29 22:32:03 +08:00
guoguangwu	0da37d8c54	chore: omit comparison to bool constant	2023-06-29 10:41:50 +08:00
Kubernetes Prow Robot	c3c731890c	Merge pull request #117927 from kaisoz/add-FailedToRetrieveImagePullSecret-event Log a warning if a ImagePullSecrets does not exist	2023-06-28 11:14:31 -07:00
Kubernetes Prow Robot	52457842d1	Merge pull request #117055 from cyclinder/csi_migration remove CSI-migration gate	2023-06-28 04:28:31 -07:00
Kubernetes Prow Robot	b3d94ae74f	Merge pull request #118786 from pohly/dra-test-skip-prepare dra: kubelet must skip NodePrepareResource if not used by any container	2023-06-27 09:58:32 -07:00
Patrick Ohly	bde66bfb55	kubelet dra: restore skipping of unused resource claims `1aeec10efb` removed iterating over containers in favor of iterating over pod claims. This had the unintended consequence that NodePrepareResource gets called unnecessarily when no container needs the claim. The more natural behavior is to skip unused resources. This enables (theoretic, at this time) use cases where some DRA driver relies on the controller part to influence scheduling, but then doesn't use CDI with containers.	2023-06-27 16:02:31 +02:00
Patrick Ohly	874daa8b52	kubelet dra: fix checking of second pod which uses a claim When a second pod wanted to use a claim, the obligatory sanity check whether the pod is really allowed to use the claim ("reserved for") was skipped.	2023-06-27 16:01:11 +02:00
Davanum Srinivas	f7239e4095	Better back off delays and connection timeout to talk to containerd Set up params similar to what we do in cadvisor: `e9068e3273/container/containerd/client.go (L59-L61)` Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2023-06-25 16:25:28 -04:00
Jan Safranek	45aa59946a	Refactor FindAttachablePluginBySpec out of CSI code path reconstructVolume() is called when kubelet may not have connection to the API server yet, therefore it cannot get CSIDriver instances to figure out if a CSI volume is attachable or not. Refactor reconstructVolume(), so it does not need FindAttachablePluginBySpec for CSI volumes, because all of them are deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the CSI volume plugin).	2023-06-23 12:28:15 +02:00
Giuseppe Scrivano	531d38e323	features: rename UserNamespacesStatelessPodsSupport now it is called UserNamespacesSupport since all kind of volumes are supported. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2023-06-22 15:19:50 +02:00
Michal Wozniak	17013d3960	Review remarks to improve HandlePodCleanups in kubelet	2023-06-22 10:55:39 +02:00
Michal Wozniak	e3ee9b9adc	Fix the deletion of rejected pods	2023-06-22 09:18:34 +02:00
Davanum Srinivas	c98e72841b	Add a connection backoff to talk to CRI impls We can add backoff for connection like we do in cadvisor: https://github.com/google/cadvisor/blob/master/container/containerd/client.go#L76-L80 for now, don't tune it, just use the default: https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2023-06-20 21:32:50 -04:00
Kubernetes Prow Robot	6a79a8a57c	Merge pull request #115835 from HirazawaUi/fix-terminationGracePeriod-bug fix terminationGracePeriod blocked by preStop	2023-06-14 10:34:18 -07:00
carlory	5e048041e4	remove helper function for unused storage feature in pkg/proxy/util	2023-06-13 09:22:59 +08:00
Kubernetes Prow Robot	86d786090a	Merge pull request #117793 from tzneal/memory-oom-group-support use the cgroup aware OOM killer if available	2023-06-12 14:45:58 -07:00

... 8 9 10 11 12 ...

11283 Commits