kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	3219564cf3	Merge pull request #116296 from SataQiu/clean-kubelet-20230306 Remove unused resize.go from pkg/kubelet/container	2023-03-09 22:43:48 -08:00
Kubernetes Prow Robot	45b96eae98	Merge pull request #113145 from smarterclayton/zombie_terminating_pods kubelet: Force deleted pods can fail to move out of terminating	2023-03-09 15:32:30 -08:00
Clayton Coleman	6b9a381185	kubelet: Force deleted pods can fail to move out of terminating If a CRI error occurs during the terminating phase after a pod is force deleted (API or static) then the housekeeping loop will not deliver updates to the pod worker which prevents the pod's state machine from progressing. The pod will remain in the terminating phase but no further attempts to terminate or cleanup will occur until the kubelet is restarted. The pod worker now maintains a store of the pods state that it is attempting to reconcile and uses that to resync unknown pods when SyncKnownPods() is invoked, so that failures in sync methods for unknown pods no longer hang forever. The pod worker's store tracks desired updates and the last update applied on podSyncStatuses. Each goroutine now synchronizes to acquire the next work item, context, and whether the pod can start. This synchronization moves the pending update to the stored last update, which will ensure third parties accessing pod worker state don't see updates before the pod worker begins synchronizing them. As a consequence, the update channel becomes a simple notifier (struct{}) so that SyncKnownPods can coordinate with the pod worker to create a synthetic pending update for unknown pods (i.e. no one besides the pod worker has data about those pods). Otherwise the pending update info would be hidden inside the channel. In order to properly track pending updates, we have to be very careful not to mix RunningPods (which are calculated from the container runtime and are missing all spec info) and config- sourced pods. Update the pod worker to avoid using ToAPIPod() and instead require the pod worker to directly use update.Options.Pod or update.Options.RunningPod for the correct methods. Add a new SyncTerminatingRuntimePod to prevent accidental invocations of runtime only pod data. Finally, fix SyncKnownPods to replay the last valid update for undesired pods which drives the pod state machine towards termination, and alter HandlePodCleanups to: - terminate runtime pods that aren't known to the pod worker - launch admitted pods that aren't known to the pod worker Any started pods receive a replay until they reach the finished state, and then are removed from the pod worker. When a desired pod is detected as not being in the worker, the usual cause is that the pod was deleted and recreated with the same UID (almost always a static pod since API UID reuse is statistically unlikely). This simplifies the previous restartable pod support. We are careful to filter for active pods (those not already terminal or those which have been previously rejected by admission). We also force a refresh of the runtime cache to ensure we don't see an older version of the state. Future changes will allow other components that need to view the pod worker's actual state (not the desired state the podManager represents) to retrieve that info from the pod worker. Several bugs in pod lifecycle have been undetectable at runtime because the kubelet does not clearly describe the number of pods in use. To better report, add the following metrics: kubelet_desired_pods: Pods the pod manager sees kubelet_active_pods: "Admitted" pods that gate new pods kubelet_mirror_pods: Mirror pods the kubelet is tracking kubelet_working_pods: Breakdown of pods from the last sync in each phase, orphaned state, and static or not kubelet_restarted_pods_total: A counter for pods that saw a CREATE before the previous pod with the same UID was finished kubelet_orphaned_runtime_pods_total: A counter for pods detected at runtime that were not known to the kubelet. Will be populated at Kubelet startup and should never be incremented after. Add a metric check to our e2e tests that verifies the values are captured correctly during a serial test, and then verify them in detail in unit tests. Adds 23 series to the kubelet /metrics endpoint.	2023-03-08 22:03:51 -06:00
SataQiu	528a471302	remove unused resize.go from pkg/kubelet/container	2023-03-06 18:33:13 +08:00
Kubernetes Prow Robot	b8aaaf380a	Merge pull request #116083 from SataQiu/clean-20230227 kubelet: remove unused DockerID type	2023-03-06 02:22:58 -08:00
ruiwen-zhao	572e6e0ffb	Add MaxParallelImagePulls support Signed-off-by: ruiwen-zhao <ruiwen@google.com>	2023-03-02 03:57:59 +00:00
Ed Bartosh	5a86895070	DRA: pass CDI devices through CRI CDIDevice field	2023-02-28 19:21:20 +02:00
SataQiu	ed2caf17e0	kubelet: remove unused DockerID type	2023-02-27 16:02:59 +08:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ed Bartosh	4f88332ab4	kubelet: prepare DRA resources before CNI setup	2023-02-06 20:40:11 +02:00
HirazawaUi	a8173eded3	delete unused functions in pkg/kubelet directory	2023-01-16 20:00:49 +08:00
Peter Hunt	6298ce68e2	kubelet: wire ListPodSandboxMetrics Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-11-08 14:47:08 -05:00
Harshal Patil	86284d42f8	Add support for Evented PLEG Signed-off-by: Harshal Patil <harpatil@redhat.com> Co-authored-by: Swarup Ghosh <swghosh@redhat.com>	2022-11-08 20:06:16 +05:30
David Ashpole	64af1adace	Second attempt: Plumb context to Kubelet CRI calls (#113591 ) * plumb context from CRI calls through kubelet * clean up extra timeouts * try fixing incorrectly cancelled context	2022-11-05 06:02:13 -07:00
Antonio Ojea	9c2b333925	Revert "plumb context from CRI calls through kubelet" This reverts commit `f43b4f1b95`.	2022-11-02 13:37:23 +00:00
David Ashpole	f43b4f1b95	plumb context from CRI calls through kubelet	2022-10-28 02:55:28 +00:00
Sascha Grunert	b296f82c69	Sort kubelet pods by their creation time There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: ubi serviceName: "ubi" replicas: 1 template: metadata: labels: app: ubi spec: terminationGracePeriodSeconds: 1000 containers: - name: ubi image: ubuntu:22.04 command: ['sh', '-c', 'echo The app is running! && sleep 360000'] ports: - containerPort: 80 name: web lifecycle: preStop: exec: command: - /bin/sh - -c - 'echo aaa; trap : TERM INT; sleep infinity & wait' ``` After creation, downscaling, forced deletion and upscaling of the replica like this: ``` > kubectl apply -f sts.yml > kubectl scale sts web --replicas=0 > kubectl delete pod web-0 --grace-period=0 --force > kubectl scale sts web --replicas=1 ``` We will end up having two pods running by the container runtime, while the API only reports one: ``` > kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 92s ``` ``` > sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default) d90088614c73b 12 minutes ago Ready web-0 default 0 (default) ``` When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`. This is caused by the container lookup via its name (and no podUID) at: `02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)` And more specifiy by the conversion of the pod result map to a slice in `GetPods`: `02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)` We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-10-13 16:32:44 +02:00
Kubernetes Prow Robot	127f33f63d	Merge pull request #111221 from inosato/remove-ioutil-from-kubelet Remove ioutil in kubelet/kubeadm and its tests	2022-09-17 21:56:28 -07:00
Giuseppe Scrivano	9b2fc639a0	kubelet: add GetUserNamespaceMappings to RuntimeHelper Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2022-08-03 19:53:22 +02:00
inosato	3b95d3b076	Remove ioutil in kubelet and its tests Signed-off-by: inosato <si17_21@yahoo.co.jp>	2022-07-30 12:35:26 +09:00
Kubernetes Prow Robot	cf2800b812	Merge pull request #111402 from verb/111030-ec-ga Promote EphemeralContainers feature to GA	2022-07-29 19:29:20 -07:00
Davanum Srinivas	a9593d634c	Generate and format files - Run hack/update-codegen.sh - Run hack/update-generated-device-plugin.sh - Run hack/update-generated-protobuf.sh - Run hack/update-generated-runtime.sh - Run hack/update-generated-swagger-docs.sh - Run hack/update-openapi-spec.sh - Run hack/update-gofmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-07-26 13:14:05 -04:00
Lee Verberne	d238e67ba6	Remove EphemeralContainers feature-gate checks	2022-07-26 02:55:30 +02:00
Adrian Reber	8c24857ba3	kubelet: add CheckpointContainer() to the runtime Signed-off-by: Adrian Reber <areber@redhat.com>	2022-07-14 10:27:41 +00:00
Mark Rossetti	0c6088861b	Fixing issue in generatePodSandboxWindowsConfig for hostProcess containers by where pod sandbox won't have HostProcess bit set if pod does not have a security context but containers specify HostProcess. Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2022-06-02 12:10:10 -07:00
Kir Kolyshkin	4513de06a8	Regen mocks using go 1.18 Generated by ./hack/update-mocks.sh using go 1.18 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-23 10:19:38 -07:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
Wojciech Tyczyński	6088fe4221	Remove no-longer used selflink code from kubelet	2022-01-14 10:38:23 +01:00
xuweiwei	21238c2593	code cleanup for container/helpers.go	2021-12-01 11:17:33 +08:00
Sascha Grunert	de37b9d293	Make CRI `v1` the default and allow a fallback to `v1alpha2` This patch makes the CRI `v1` API the new project-wide default version. To allow backwards compatibility, a fallback to `v1alpha2` has been added as well. This fallback can either used by automatically determined by the kubelet. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-11-17 11:05:05 -08:00
Skyler Clark	d3ae0a381a	prevents garbage collection from removing pinned images	2021-11-02 14:43:02 -04:00
Tim Hockin	11a25bfeb6	De-share the Handler struct in core API (#105979 ) * De-share the Handler struct in core API An upcoming PR adds a handler that only applies on one of these paths. Having fields that don't work seems bad. This never should have been shared. Lifecycle hooks are like a "write" while probes are more like a "read". HTTPGet and TCPSocket don't really make sense as lifecycle hooks (but I can't take that back). When we add gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary RPC - so a probe makes sense but a hook does not. In the future I can also see adding lifecycle hooks that don't make sense as probes. E.g. 'sleep' is a common lifecycle request. The only option is `exec`, which requires having a sleep binary in your image. * Run update scripts	2021-10-29 13:15:11 -07:00
vikram Jadhav	0de4397490	mockery to mockgen conversion	2021-09-25 16:15:08 +00:00
Clayton Coleman	3eadd1a9ea	Keep pod worker running until pod is truly complete A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state. Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}). Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components simply delegate to checking whether they are in the window by UID. Removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup. Node shutdown no longer marks pods as failed as they can be restarted in the next step. See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details	2021-07-06 15:55:22 -04:00
Kubernetes Prow Robot	13cafd5cb0	Merge pull request #101480 from yuzhiquan/little-nit-for-kubelet Fix some nit for kubelet	2021-05-24 21:49:05 -07:00
marosset	fd94032b21	Kubelet updates for Windows HostProcess Containers	2021-05-19 16:24:14 -07:00
yuzhiquan	bebca30309	comment should have function name as prefix	2021-04-28 15:26:46 +08:00
JunYang	01a4e4face	Structured Logging migration: modify volume and container part logs of kubelet. Signed-off-by: JunYang <yang.jun22@zte.com.cn>	2021-03-17 08:59:03 +08:00
Kubernetes Prow Robot	a4025a8462	Merge pull request #98986 from gjkim42/fix-runtime-assert kubelet: Make the test fail if (*FakeRuntime).Assert fails	2021-03-04 18:34:33 -08:00
Benjamin Elder	56e092e382	hack/update-bazel.sh	2021-02-28 15:17:29 -08:00
Geonju Kim	fc4a29da2c	kubelet: Make the test fail if (*FakeRuntime).Assert fails	2021-02-26 06:31:54 +09:00
Sergey Kanzhelev	4c9e96c238	Revert "Merge pull request #92817 from kmala/kubelet" This reverts commit `88512be213`, reversing changes made to `c3b888f647`.	2021-01-12 22:27:22 +00:00
Sergey Kanzhelev	6c2556c5c4	The function shouldRecordEvent will panic when the value of input object is nil	2020-10-16 21:13:49 +00:00
Kubernetes Prow Robot	e6444e01ba	Merge pull request #94494 from SergeyKanzhelev/hostportConflicts Allow to map the same container port to different host ports	2020-09-22 12:23:40 -07:00
Kubernetes Prow Robot	6ac2930ef0	Merge pull request #94574 from auxten/pkg-kubelet-staticchecks Fix pkg/kubelet static checks	2020-09-21 21:22:47 -07:00
Kubernetes Prow Robot	88512be213	Merge pull request #92817 from kmala/kubelet Check for sandboxes before deleting the pod from apiserver	2020-09-10 07:27:45 -07:00
auxten	a9c1acc044	Fix staticchecks ST1005,S1002,S1008,S1039 in pkg/kubelet	2020-09-07 10:53:43 +08:00
Sergey Kanzhelev	1c379b1281	allow to map the same container port to different host ports	2020-09-03 22:21:18 +00:00
Kubernetes Prow Robot	274e33b691	Merge pull request #93581 from SergeyKanzhelev/nameOfPortMappingIsNotNeeded Clean up in port mapping functionality	2020-08-27 16:06:11 -07:00

1 2 3 4 5 ...

494 Commits