kubernetes

Author	SHA1	Message	Date
vinay kulkarni	01b96e7704	Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources	2023-03-10 14:49:26 +00:00
Clayton Coleman	6b9a381185	kubelet: Force deleted pods can fail to move out of terminating If a CRI error occurs during the terminating phase after a pod is force deleted (API or static) then the housekeeping loop will not deliver updates to the pod worker which prevents the pod's state machine from progressing. The pod will remain in the terminating phase but no further attempts to terminate or cleanup will occur until the kubelet is restarted. The pod worker now maintains a store of the pods state that it is attempting to reconcile and uses that to resync unknown pods when SyncKnownPods() is invoked, so that failures in sync methods for unknown pods no longer hang forever. The pod worker's store tracks desired updates and the last update applied on podSyncStatuses. Each goroutine now synchronizes to acquire the next work item, context, and whether the pod can start. This synchronization moves the pending update to the stored last update, which will ensure third parties accessing pod worker state don't see updates before the pod worker begins synchronizing them. As a consequence, the update channel becomes a simple notifier (struct{}) so that SyncKnownPods can coordinate with the pod worker to create a synthetic pending update for unknown pods (i.e. no one besides the pod worker has data about those pods). Otherwise the pending update info would be hidden inside the channel. In order to properly track pending updates, we have to be very careful not to mix RunningPods (which are calculated from the container runtime and are missing all spec info) and config- sourced pods. Update the pod worker to avoid using ToAPIPod() and instead require the pod worker to directly use update.Options.Pod or update.Options.RunningPod for the correct methods. Add a new SyncTerminatingRuntimePod to prevent accidental invocations of runtime only pod data. Finally, fix SyncKnownPods to replay the last valid update for undesired pods which drives the pod state machine towards termination, and alter HandlePodCleanups to: - terminate runtime pods that aren't known to the pod worker - launch admitted pods that aren't known to the pod worker Any started pods receive a replay until they reach the finished state, and then are removed from the pod worker. When a desired pod is detected as not being in the worker, the usual cause is that the pod was deleted and recreated with the same UID (almost always a static pod since API UID reuse is statistically unlikely). This simplifies the previous restartable pod support. We are careful to filter for active pods (those not already terminal or those which have been previously rejected by admission). We also force a refresh of the runtime cache to ensure we don't see an older version of the state. Future changes will allow other components that need to view the pod worker's actual state (not the desired state the podManager represents) to retrieve that info from the pod worker. Several bugs in pod lifecycle have been undetectable at runtime because the kubelet does not clearly describe the number of pods in use. To better report, add the following metrics: kubelet_desired_pods: Pods the pod manager sees kubelet_active_pods: "Admitted" pods that gate new pods kubelet_mirror_pods: Mirror pods the kubelet is tracking kubelet_working_pods: Breakdown of pods from the last sync in each phase, orphaned state, and static or not kubelet_restarted_pods_total: A counter for pods that saw a CREATE before the previous pod with the same UID was finished kubelet_orphaned_runtime_pods_total: A counter for pods detected at runtime that were not known to the kubelet. Will be populated at Kubelet startup and should never be incremented after. Add a metric check to our e2e tests that verifies the values are captured correctly during a serial test, and then verify them in detail in unit tests. Adds 23 series to the kubelet /metrics endpoint.	2023-03-08 22:03:51 -06:00
ruiwen-zhao	572e6e0ffb	Add MaxParallelImagePulls support Signed-off-by: ruiwen-zhao <ruiwen@google.com>	2023-03-02 03:57:59 +00:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ed Bartosh	4f88332ab4	kubelet: prepare DRA resources before CNI setup	2023-02-06 20:40:11 +02:00
Peter Hunt	6298ce68e2	kubelet: wire ListPodSandboxMetrics Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-11-08 14:47:08 -05:00
Harshal Patil	86284d42f8	Add support for Evented PLEG Signed-off-by: Harshal Patil <harpatil@redhat.com> Co-authored-by: Swarup Ghosh <swghosh@redhat.com>	2022-11-08 20:06:16 +05:30
David Ashpole	64af1adace	Second attempt: Plumb context to Kubelet CRI calls (#113591 ) * plumb context from CRI calls through kubelet * clean up extra timeouts * try fixing incorrectly cancelled context	2022-11-05 06:02:13 -07:00
Kubernetes Prow Robot	1bf4af4584	Merge pull request #111930 from azylinski/new-histogram-pod_start_sli_duration_seconds New histogram: Pod start SLI duration	2022-11-04 07:28:14 -07:00
astraw99	244598af80	Add back-off restarting failed container name	2022-11-02 20:46:32 +08:00
Artur Żyliński	b0fac15cd6	Make the interface local to each package	2022-10-26 11:28:18 +02:00
Artur Żyliński	9f31669a53	New histogram: Pod start SLI duration	2022-10-26 11:28:17 +02:00
Jordan Liggitt	122b43037e	Record event for lifecycle fallback to http	2022-10-19 14:11:36 -04:00
Jason Simmons	5a6acf85fa	Align lifecycle handlers and probes Align the behavior of HTTP-based lifecycle handlers and HTTP-based probers, converging on the probers implementation. This fixes multiple deficiencies in the current implementation of lifecycle handlers surrounding what functionality is available. The functionality is gated by the features.ConsistentHTTPGetHandlers feature gate.	2022-10-19 09:51:52 -07:00
Kubernetes Prow Robot	843ad71cac	Merge pull request #113041 from saschagrunert/kubelet-pods-creation-time Sort kubelet pods by their creation time	2022-10-18 09:17:19 -07:00
Sascha Grunert	b296f82c69	Sort kubelet pods by their creation time There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: ubi serviceName: "ubi" replicas: 1 template: metadata: labels: app: ubi spec: terminationGracePeriodSeconds: 1000 containers: - name: ubi image: ubuntu:22.04 command: ['sh', '-c', 'echo The app is running! && sleep 360000'] ports: - containerPort: 80 name: web lifecycle: preStop: exec: command: - /bin/sh - -c - 'echo aaa; trap : TERM INT; sleep infinity & wait' ``` After creation, downscaling, forced deletion and upscaling of the replica like this: ``` > kubectl apply -f sts.yml > kubectl scale sts web --replicas=0 > kubectl delete pod web-0 --grace-period=0 --force > kubectl scale sts web --replicas=1 ``` We will end up having two pods running by the container runtime, while the API only reports one: ``` > kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 92s ``` ``` > sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default) d90088614c73b 12 minutes ago Ready web-0 default 0 (default) ``` When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`. This is caused by the container lookup via its name (and no podUID) at: `02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)` And more specifiy by the conversion of the pod result map to a slice in `GetPods`: `02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)` We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-10-13 16:32:44 +02:00
Dixita Narang	ff1f525511	Setting LockToDefault as true for KubeletCredentialProviders feature, and removing conditions that check if the feature is enabled since now the feature is enabled by default	2022-09-29 16:42:48 +00:00
Antonio Ojea	d434c588d7	Revert "change CPUCFSQuotaPeriod default value to 100us to match Linux default" This reverts commit `f2d591fae6`.	2022-08-26 23:51:04 +02:00
Dmitry Verkhoturov	f2d591fae6	change CPUCFSQuotaPeriod default value to 100us to match Linux default cpu.cfs_period_us is 100μs by default despite having an "ms" unit for some unfortunate reason. Documentation: https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management The desired effect of that change is to match k8s default `CPUCFSQuotaPeriod` value (100ms before that change) with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled and Linux CFS (100us, 1000x smaller than 100ms).	2022-08-10 03:25:05 +02:00
Kubernetes Prow Robot	2e1a4da8df	Merge pull request #111358 from ddebroy/hasnet1 Introduce PodHasNetwork condition for pods	2022-08-01 15:04:52 -07:00
Deep Debroy	dfdf8245bb	Introduce PodHasNetwork condition for pods Signed-off-by: Deep Debroy <ddebroy@gmail.com>	2022-08-01 09:51:43 -07:00
Lee Verberne	d238e67ba6	Remove EphemeralContainers feature-gate checks	2022-07-26 02:55:30 +02:00
Adrian Reber	8c24857ba3	kubelet: add CheckpointContainer() to the runtime Signed-off-by: Adrian Reber <areber@redhat.com>	2022-07-14 10:27:41 +00:00
Kubernetes Prow Robot	1b2de5cf01	Merge pull request #109042 from bjorand/network_panic_kubelet kubelet: fix panic triggered when playing with a wip CRI	2022-05-03 18:24:20 -07:00
Benjamin Jorand	3c65728ede	kubelet: fix panic triggered when playing with a wip CRI	2022-03-26 00:23:35 +01:00
Deep Debroy	023d6fb8f4	Pass instrumented runtime service to containergc Signed-off-by: Deep Debroy <ddebroy@gmail.com>	2022-03-08 14:33:37 +00:00
KeZhang	3946d99904	Ignore container notfound error while getPodstatuses	2022-02-16 08:55:19 +08:00
Kubernetes Prow Robot	64e83a7e43	Merge pull request #107945 from saschagrunert/cri-verbose Add support for CRI `verbose` fields	2022-02-14 17:58:12 -08:00
Sascha Grunert	effbcd3a0a	Add support for CRI `verbose` fields The remote runtime implementation now supports the `verbose` fields, which are required for consumers like cri-tools to enable multi CRI version support. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-02-10 17:12:26 +01:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
cyclinder	07999dac70	Clean up dockershim flags in the kubelet Signed-off-by: cyclinder <qifeng.guo@daocloud.io> Co-authored-by: Ciprian Hacman <ciprian@hakman.dev> Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-01-14 16:02:50 +02:00
Ciprian Hacman	5bae9b9288	Clean up DockerLegacyService interface Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2021-12-18 12:24:54 +02:00
Sascha Grunert	de37b9d293	Make CRI `v1` the default and allow a fallback to `v1alpha2` This patch makes the CRI `v1` API the new project-wide default version. To allow backwards compatibility, a fallback to `v1alpha2` has been added as well. This fallback can either used by automatically determined by the kubelet. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-11-17 11:05:05 -08:00
Kubernetes Prow Robot	5d60c8d857	Merge pull request #102393 from mengjiao-liu/fix-sysctl-regex Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-09 18:23:26 -08:00
Mark Rossetti	ef324d6bbd	Adding kubelet metrics for started and failed to start HostProcess containers Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2021-11-04 14:39:57 -07:00
Mengjiao Liu	275d832ce2	Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-04 11:49:56 +08:00
yxxhero	35df409a7e	remove StartedPodsErrorsTotal metrice message Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-23 22:18:56 +08:00
Sascha Grunert	46077e6be7	Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` configuration The configuration is deprecated and targets removal for v1.23. Tests cases have been changed as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-08-31 09:55:28 +02:00
Ryan Phillips	30e9a420c4	kubelet: fix sandbox creation error suppression when pods are quickly deleted	2021-08-10 08:55:25 -05:00
Kubernetes Prow Robot	dab6f6a43d	Merge pull request #102344 from smarterclayton/keep_pod_worker Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker	2021-07-08 16:48:53 -07:00
Kubernetes Prow Robot	a9d7526864	Merge pull request #102970 from tkestack/feature-memory-qos Feature: Support memory qos with cgroups v2	2021-07-08 14:01:36 -07:00
Kubernetes Prow Robot	7c84064a4f	Merge pull request #99000 from verb/1.21-kubelet-metrics Add kubelet metrics for ephemeral containers	2021-07-08 14:00:55 -07:00
Li Bo	c3d9b10ca8	feature: support Memory QoS for cgroups v2	2021-07-08 09:26:46 +08:00
Clayton Coleman	3eadd1a9ea	Keep pod worker running until pod is truly complete A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state. Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}). Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components simply delegate to checking whether they are in the window by UID. Removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup. Node shutdown no longer marks pods as failed as they can be restarted in the next step. See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details	2021-07-06 15:55:22 -04:00
Elana Hashman	0deef4610e	Set MemorySwapLimitInBytes for CRI when NodeSwapEnabled	2021-06-29 11:59:02 -07:00
Sascha Grunert	8b7003aff4	Add SeccompDefault feature This adds the gate `SeccompDefault` as new alpha feature. Seccomp path and field fallbacks are now passed to the helper functions, whereas unit tests covering those code paths have been added as well. Beside enabling the feature gate, the feature has to be enabled by the `SeccompDefault` kubelet configuration or its corresponding `--seccomp-default` CLI flag. Signed-off-by: Sascha Grunert <sgrunert@redhat.com> Apply suggestions from code review Co-authored-by: Paulo Gomes <pjbgf@linux.com> Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-06-23 10:22:57 +02:00
yuzhiquan	bebca30309	comment should have function name as prefix	2021-04-28 15:26:46 +08:00
Lee Verberne	29178fff1c	Add kubelet managed pod metrics	2021-04-13 14:13:30 +02:00
Aditi Sharma	461c0c1656	Fix structured logging for kuberuntime_manger.go	2021-03-15 10:13:18 +05:30

1 2 3 4 5

217 Commits