kubernetes

Author	SHA1	Message	Date
Giuseppe Scrivano	024146f705	KEP-127: the kubelet stores runtime helpers as they are received from the ResponseStatus request to the runtime. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2024-02-27 11:07:35 +01:00
Kubernetes Prow Robot	ac6f707155	Merge pull request #120620 from tzneal/sidecar-termination-ordering sidecars: terminate sidecars after main containers	2023-10-31 19:16:11 +01:00
Kubernetes Prow Robot	441d4b54ae	Merge pull request #120397 from ty-dc/StaticCheck cleanup: omit comparison with bool constants	2023-10-24 05:25:52 +02:00
Todd Neal	7bcc98c46b	sidecars: terminate sidecars after main containers Sidecars should terminate: - after all main containers have exited - serialized and in reverse order	2023-10-17 19:07:21 -05:00
tao.yang	b35357b6c0	cleanup: omit comparison with bool constants Signed-off-by: tao.yang <tao.yang@daocloud.io>	2023-09-05 10:24:38 +08:00
Gunju Kim	696f84aeb0	Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime	2023-09-01 00:13:47 +09:00
Antonio Ojea	f355b22f5f	implement Stringer for podActions klog prints an internal error when trying to log the podActions struct. > I0505 14:12:12.827065 190662 kuberuntime_manager.go:1014] "computePodActions got for pod" podActions="<internal error: json: unsupported type: map[container.ContainerID]kuberuntime.containerToKillInfo>" pod="kube-system/coredns-8f5847b64-mzw46" Implement the stringer interface on the struct to avoid the json error. Change-Id: I22444524a78a0ecec9490b9240def371a4129434	2023-08-07 22:48:28 +00:00
Ed Bartosh	229eb93a83	DRA: report NodePrepareResource errors Log an error and submit an event when NodePrepareResource fails.	2023-07-17 12:56:28 +03:00
Gunju Kim	b94fa250c2	Sidecar: Implement lifecycle of the restartable init container - Implement `computeInitContainerActions` to sets the actions for the init containers, including those with `RestartPolicyAlways`. - Allow StartupProbe on the restartable init containers. - Update PodPhase considering the restartable init containers. - Update PodInitialized status and status manager considering the restartable init containers. Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com>	2023-07-08 07:26:12 +09:00
Tim Hockin	dd7af241c1	Replace diff.ObjectDiff with cmp.Equal More obvious and cheaper, and ObjectDiff is already written in terms of cmp.	2023-04-12 08:45:32 -07:00
Kubernetes Prow Robot	9ddf1a02bd	Merge pull request #116504 from vinaykul/restart-free-pod-vertical-scaling-kubeletonly-fix Fix null pointer access in doPodResizeAction for kubeletonly mode	2023-03-14 19:26:59 -07:00
Kubernetes Prow Robot	9053b5dc2c	Merge pull request #116119 from vinaykul/restart-free-pod-vertical-scaling-fixes Restructure resize policy naming and set default resize policy values	2023-03-14 19:26:42 -07:00
vinay kulkarni	86efc8bd79	Add isInPlacePodVerticalScalingAllowed for restart check block	2023-03-14 20:30:02 +00:00
vinay kulkarni	5b2682ac04	Make in-place resize exclusion conditions (such as static pods) very obvious	2023-03-14 19:37:35 +00:00
Kubernetes Prow Robot	c8f001d798	Merge pull request #114504 from vrutkovs/tracing-kubelet-toplevel kubelet: create top-level traces for pod sync and GC	2023-03-14 03:12:16 -07:00
vinay kulkarni	8b23497ae7	Restructure naming of resource resize restart policy	2023-03-12 23:11:32 +00:00
vinay kulkarni	1c7850c355	Fix null pointer access in doPodResizeAction for kubeletonly mode	2023-03-12 05:59:14 +00:00
Vadim Rutkovsky	556d774945	kubelet: create top-level traces for pod sync and GC This starts new top level OpenTelemetry spans every time syncPod or image / container GC is invoked	2023-03-11 10:42:14 +01:00
vinay kulkarni	01b96e7704	Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources	2023-03-10 14:49:26 +00:00
Clayton Coleman	6b9a381185	kubelet: Force deleted pods can fail to move out of terminating If a CRI error occurs during the terminating phase after a pod is force deleted (API or static) then the housekeeping loop will not deliver updates to the pod worker which prevents the pod's state machine from progressing. The pod will remain in the terminating phase but no further attempts to terminate or cleanup will occur until the kubelet is restarted. The pod worker now maintains a store of the pods state that it is attempting to reconcile and uses that to resync unknown pods when SyncKnownPods() is invoked, so that failures in sync methods for unknown pods no longer hang forever. The pod worker's store tracks desired updates and the last update applied on podSyncStatuses. Each goroutine now synchronizes to acquire the next work item, context, and whether the pod can start. This synchronization moves the pending update to the stored last update, which will ensure third parties accessing pod worker state don't see updates before the pod worker begins synchronizing them. As a consequence, the update channel becomes a simple notifier (struct{}) so that SyncKnownPods can coordinate with the pod worker to create a synthetic pending update for unknown pods (i.e. no one besides the pod worker has data about those pods). Otherwise the pending update info would be hidden inside the channel. In order to properly track pending updates, we have to be very careful not to mix RunningPods (which are calculated from the container runtime and are missing all spec info) and config- sourced pods. Update the pod worker to avoid using ToAPIPod() and instead require the pod worker to directly use update.Options.Pod or update.Options.RunningPod for the correct methods. Add a new SyncTerminatingRuntimePod to prevent accidental invocations of runtime only pod data. Finally, fix SyncKnownPods to replay the last valid update for undesired pods which drives the pod state machine towards termination, and alter HandlePodCleanups to: - terminate runtime pods that aren't known to the pod worker - launch admitted pods that aren't known to the pod worker Any started pods receive a replay until they reach the finished state, and then are removed from the pod worker. When a desired pod is detected as not being in the worker, the usual cause is that the pod was deleted and recreated with the same UID (almost always a static pod since API UID reuse is statistically unlikely). This simplifies the previous restartable pod support. We are careful to filter for active pods (those not already terminal or those which have been previously rejected by admission). We also force a refresh of the runtime cache to ensure we don't see an older version of the state. Future changes will allow other components that need to view the pod worker's actual state (not the desired state the podManager represents) to retrieve that info from the pod worker. Several bugs in pod lifecycle have been undetectable at runtime because the kubelet does not clearly describe the number of pods in use. To better report, add the following metrics: kubelet_desired_pods: Pods the pod manager sees kubelet_active_pods: "Admitted" pods that gate new pods kubelet_mirror_pods: Mirror pods the kubelet is tracking kubelet_working_pods: Breakdown of pods from the last sync in each phase, orphaned state, and static or not kubelet_restarted_pods_total: A counter for pods that saw a CREATE before the previous pod with the same UID was finished kubelet_orphaned_runtime_pods_total: A counter for pods detected at runtime that were not known to the kubelet. Will be populated at Kubelet startup and should never be incremented after. Add a metric check to our e2e tests that verifies the values are captured correctly during a serial test, and then verify them in detail in unit tests. Adds 23 series to the kubelet /metrics endpoint.	2023-03-08 22:03:51 -06:00
ruiwen-zhao	572e6e0ffb	Add MaxParallelImagePulls support Signed-off-by: ruiwen-zhao <ruiwen@google.com>	2023-03-02 03:57:59 +00:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ed Bartosh	4f88332ab4	kubelet: prepare DRA resources before CNI setup	2023-02-06 20:40:11 +02:00
Peter Hunt	6298ce68e2	kubelet: wire ListPodSandboxMetrics Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-11-08 14:47:08 -05:00
Harshal Patil	86284d42f8	Add support for Evented PLEG Signed-off-by: Harshal Patil <harpatil@redhat.com> Co-authored-by: Swarup Ghosh <swghosh@redhat.com>	2022-11-08 20:06:16 +05:30
David Ashpole	64af1adace	Second attempt: Plumb context to Kubelet CRI calls (#113591 ) * plumb context from CRI calls through kubelet * clean up extra timeouts * try fixing incorrectly cancelled context	2022-11-05 06:02:13 -07:00
Kubernetes Prow Robot	1bf4af4584	Merge pull request #111930 from azylinski/new-histogram-pod_start_sli_duration_seconds New histogram: Pod start SLI duration	2022-11-04 07:28:14 -07:00
astraw99	244598af80	Add back-off restarting failed container name	2022-11-02 20:46:32 +08:00
Artur Żyliński	b0fac15cd6	Make the interface local to each package	2022-10-26 11:28:18 +02:00
Artur Żyliński	9f31669a53	New histogram: Pod start SLI duration	2022-10-26 11:28:17 +02:00
Jordan Liggitt	122b43037e	Record event for lifecycle fallback to http	2022-10-19 14:11:36 -04:00
Jason Simmons	5a6acf85fa	Align lifecycle handlers and probes Align the behavior of HTTP-based lifecycle handlers and HTTP-based probers, converging on the probers implementation. This fixes multiple deficiencies in the current implementation of lifecycle handlers surrounding what functionality is available. The functionality is gated by the features.ConsistentHTTPGetHandlers feature gate.	2022-10-19 09:51:52 -07:00
Kubernetes Prow Robot	843ad71cac	Merge pull request #113041 from saschagrunert/kubelet-pods-creation-time Sort kubelet pods by their creation time	2022-10-18 09:17:19 -07:00
Sascha Grunert	b296f82c69	Sort kubelet pods by their creation time There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: ubi serviceName: "ubi" replicas: 1 template: metadata: labels: app: ubi spec: terminationGracePeriodSeconds: 1000 containers: - name: ubi image: ubuntu:22.04 command: ['sh', '-c', 'echo The app is running! && sleep 360000'] ports: - containerPort: 80 name: web lifecycle: preStop: exec: command: - /bin/sh - -c - 'echo aaa; trap : TERM INT; sleep infinity & wait' ``` After creation, downscaling, forced deletion and upscaling of the replica like this: ``` > kubectl apply -f sts.yml > kubectl scale sts web --replicas=0 > kubectl delete pod web-0 --grace-period=0 --force > kubectl scale sts web --replicas=1 ``` We will end up having two pods running by the container runtime, while the API only reports one: ``` > kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 92s ``` ``` > sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default) d90088614c73b 12 minutes ago Ready web-0 default 0 (default) ``` When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`. This is caused by the container lookup via its name (and no podUID) at: `02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)` And more specifiy by the conversion of the pod result map to a slice in `GetPods`: `02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)` We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-10-13 16:32:44 +02:00
Dixita Narang	ff1f525511	Setting LockToDefault as true for KubeletCredentialProviders feature, and removing conditions that check if the feature is enabled since now the feature is enabled by default	2022-09-29 16:42:48 +00:00
Antonio Ojea	d434c588d7	Revert "change CPUCFSQuotaPeriod default value to 100us to match Linux default" This reverts commit `f2d591fae6`.	2022-08-26 23:51:04 +02:00
Dmitry Verkhoturov	f2d591fae6	change CPUCFSQuotaPeriod default value to 100us to match Linux default cpu.cfs_period_us is 100μs by default despite having an "ms" unit for some unfortunate reason. Documentation: https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management The desired effect of that change is to match k8s default `CPUCFSQuotaPeriod` value (100ms before that change) with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled and Linux CFS (100us, 1000x smaller than 100ms).	2022-08-10 03:25:05 +02:00
Kubernetes Prow Robot	2e1a4da8df	Merge pull request #111358 from ddebroy/hasnet1 Introduce PodHasNetwork condition for pods	2022-08-01 15:04:52 -07:00
Deep Debroy	dfdf8245bb	Introduce PodHasNetwork condition for pods Signed-off-by: Deep Debroy <ddebroy@gmail.com>	2022-08-01 09:51:43 -07:00
Lee Verberne	d238e67ba6	Remove EphemeralContainers feature-gate checks	2022-07-26 02:55:30 +02:00
Adrian Reber	8c24857ba3	kubelet: add CheckpointContainer() to the runtime Signed-off-by: Adrian Reber <areber@redhat.com>	2022-07-14 10:27:41 +00:00
Kubernetes Prow Robot	1b2de5cf01	Merge pull request #109042 from bjorand/network_panic_kubelet kubelet: fix panic triggered when playing with a wip CRI	2022-05-03 18:24:20 -07:00
Benjamin Jorand	3c65728ede	kubelet: fix panic triggered when playing with a wip CRI	2022-03-26 00:23:35 +01:00
Deep Debroy	023d6fb8f4	Pass instrumented runtime service to containergc Signed-off-by: Deep Debroy <ddebroy@gmail.com>	2022-03-08 14:33:37 +00:00
KeZhang	3946d99904	Ignore container notfound error while getPodstatuses	2022-02-16 08:55:19 +08:00
Kubernetes Prow Robot	64e83a7e43	Merge pull request #107945 from saschagrunert/cri-verbose Add support for CRI `verbose` fields	2022-02-14 17:58:12 -08:00
Sascha Grunert	effbcd3a0a	Add support for CRI `verbose` fields The remote runtime implementation now supports the `verbose` fields, which are required for consumers like cri-tools to enable multi CRI version support. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-02-10 17:12:26 +01:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
cyclinder	07999dac70	Clean up dockershim flags in the kubelet Signed-off-by: cyclinder <qifeng.guo@daocloud.io> Co-authored-by: Ciprian Hacman <ciprian@hakman.dev> Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-01-14 16:02:50 +02:00

1 2 3 4 5

235 Commits