kubernetes

Author	SHA1	Message	Date
Clayton Coleman	1d518adb76	kubelet: Pod probes should be handled by pod worker The pod worker is the owner of when a container is running or not, and the start and stop of the probes for a given pod should be handled during the pod sync loop. This ensures that probes do not continue running even after eviction. Because the pod semantics allow lifecycle probes to shorten grace period, the probe is removed after the containers in a pod are terminated successfully. As an optimization, if the pod will have a very short grace period (0 or 1 seconds) we stop the probes immediately to reduce resource usage during eviction slightly. After this change, the probe manager is only called by the pod worker or by the reconcile loop.	2022-06-06 17:00:54 -05:00
Kubernetes Prow Robot	d796dd7d0f	Merge pull request #108193 from utkarsh348/myfeature Fixed race condition in test manager shutdown	2022-03-27 05:55:21 -07:00
Hemant Kumar	13b34d9c77	Use tempdir for shutdown tests	2022-03-24 11:58:49 -04:00
Shiming Zhang	ced991cb00	Emit Metrics in the shutdown process	2022-03-16 10:14:55 +08:00
Shiming Zhang	a1fadab4b0	Atomic write status file	2022-03-11 17:50:33 +08:00
Shiming Zhang	4aed18935e	Add test for storage	2022-03-11 17:31:10 +08:00
Shiming Zhang	5eb3e88f6b	Support metrics for node shutdown	2022-03-11 17:31:10 +08:00
utkarsh348	eaee96efd3	Fixed race condition test manager shutdown	2022-02-18 11:20:02 +05:30
calvin	d9ab5e18d3	fix: data race when hijack klog Signed-off-by: calvin <wen.chen@daocloud.io>	2022-01-24 15:01:49 +08:00
Kubernetes Prow Robot	09fccc3533	Merge pull request #106796 from jonyhy96/fix-timer kubelet: use newtimer instead in nodeshutdown manager	2022-01-06 11:47:12 -08:00
haoyun	92fa957dd1	feat: use clock instead Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-12-10 13:59:12 +08:00
David Porter	95264a418d	kubelet: set failed phase during graceful shutdown Revert to previous behavior in 1.21/1.20 of setting pod phase to failed during graceful node shutdown. Setting pods to failed phase will ensure that external controllers that manage pods like deployments will create new pods to replace those that are shutdown. Many customers have taken a dependency on this behavior and it was breaking change in 1.22, so this change reverts back to the previous behavior. Signed-off-by: David Porter <david@porter.me>	2021-12-09 13:17:40 -08:00
Shiming Zhang	545313bdc7	Implement graceful shutdown based on Pod priority	2021-11-17 11:47:12 +08:00
Shiming Zhang	e47c78a354	Add log for creating node shutdown manager	2021-10-15 11:16:21 +08:00
Shiming Zhang	b468c24e85	Refactor to use structure to pass parameters	2021-10-15 11:16:21 +08:00
Ryan Phillips	e2e938066d	kubelet: add probe termination to graceful shutdowns	2021-09-22 14:13:25 -05:00
Kubernetes Prow Robot	7c71e06cd1	Merge pull request #104959 from calvin0327/issue-test-dataRace fix the test issue of node shutdown manager	2021-09-21 11:56:30 -07:00
calvin0327	db82e282fc	fix the test issue of data race to node shutdown manager	2021-09-13 18:12:19 +08:00
wojtekt	53ce79a18a	Migrate to k8s.io/utils/clock in pkg/kubelet	2021-09-10 12:20:09 +02:00
Stephen Augustus	481cf6fbe7	generated: Run hack/update-gofmt.sh Signed-off-by: Stephen Augustus <foo@auggie.dev>	2021-08-24 15:47:49 -04:00
Kubernetes Prow Robot	8dbc33d649	Merge pull request #101081 from rphillips/add_graceful_shutdown_event kubelet: add graceful shutdown events	2021-08-17 22:08:08 -07:00
Kubernetes Prow Robot	d7c1663556	Merge pull request #103137 from wzshiming/fix/expected_inhibit_delay Allow the actual inhibit delay to be greater than the expected inhibit delay	2021-08-17 11:41:49 -07:00
Kubernetes Prow Robot	a6c2cd7d18	Merge pull request #103291 from wzshiming/fix/nodeshutdown-restart Fix Data Race in nodeshutdown restart	2021-07-09 08:43:14 -07:00
Clayton Coleman	3eadd1a9ea	Keep pod worker running until pod is truly complete A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state. Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}). Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components simply delegate to checking whether they are in the window by UID. Removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup. Node shutdown no longer marks pods as failed as they can be restarted in the next step. See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details	2021-07-06 15:55:22 -04:00
Shiming Zhang	212ce7c287	Shorten test time	2021-06-30 09:48:26 +08:00
Shiming Zhang	a42c066af7	Fix Data Race in nodeshutdown restart	2021-06-29 16:23:45 +08:00
Shiming Zhang	97bcfbd674	Allow the actual inhibit delay to be greater than the expected inhibit delay	2021-06-24 14:11:58 +08:00
Ryan Phillips	d9be5abc37	kubelet: add shutdown events	2021-06-23 16:44:19 -05:00
Kubernetes Prow Robot	62fdaabe82	Merge pull request #102635 from charlesxsh/fix-linux-test fix a potential deadlock in graceful node shutdown unit tests	2021-06-21 16:27:45 -07:00
Guillaume Le Biller	f1de598233	Improve terminated pod message when node is shutting down Signed-off-by: Guillaume Le Biller <glebiller@Traveldoo.com>	2021-06-15 18:29:54 +02:00
Kubernetes Prow Robot	4e7fc6df63	Merge pull request #100369 from wzshiming/fix/restart-dbus-for-graceful-node-shutdown After DBus restarts, make GracefulNodeShutdown work again	2021-06-14 20:50:00 -07:00
Shihao Xia	a2a4b50bc1	fixed deadlock	2021-06-03 18:03:17 -04:00
Shiming Zhang	202a012093	Add restart unit test	2021-05-23 00:47:36 +08:00
Kir Kolyshkin	029e6b6e3a	pkg/kubelet/nodeshutdown/systemd: fix for dbus 5.0.4 dbus 5.0.4 adds StoreProperty method which needs to be implemented for the mock. Fixes the errors like > pkg/kubelet/nodeshutdown/systemd/inhibit_linux_test.go:88:9: cannot use f.fakeDBusObject (variable of type *fakeDBusObject) as dbus.BusObject value in return statement: missing method StoreProperty Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-05-19 23:51:57 -07:00
Shiming Zhang	9c59e6c85f	After dbus restarts, make GracefulNodeShutdown work again	2021-05-19 10:05:38 +08:00
Jordan Liggitt	4b45d0d921	Revert "Merge pull request 101888 from kolyshkin/update-runc-rc94" This reverts commit `b1b06fe0a4`, reversing changes made to `382a33986b`.	2021-05-18 09:13:47 -04:00
Kir Kolyshkin	8167f83437	pkg/kubelet/nodeshutdown/systemd: fix for dbus 5.0.4 dbus 5.0.4 adds StoreProperty method which needs to be implemented for the mock. Fixes the errors like > pkg/kubelet/nodeshutdown/systemd/inhibit_linux_test.go:88:9: cannot use f.fakeDBusObject (variable of type *fakeDBusObject) as dbus.BusObject value in return statement: missing method StoreProperty Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-05-11 11:11:02 -07:00
wangyx1992	31d449bf57	cleanup: use plain channel send or receive instead of single-case select Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>	2021-04-23 11:17:12 +08:00
Kubernetes Prow Robot	3c20c5aa2f	Merge pull request #100177 from wangyx1992/wrapped-error fix errors in wrapped format	2021-04-13 23:24:42 -07:00
wangyx1992	34c2b2360b	fix errors in wrapped format Signed-off-by: wangyx1992 <wang.yixiang@zte.com.cn>	2021-03-26 14:57:55 +08:00
JUN YANG	90bfd38b83	Structured Logging migration: modify node and pod part logs of kubelet. Signed-off-by: JunYang <yang.jun22@zte.com.cn>	2021-03-13 12:31:09 +08:00
David Porter	893f5fd4f0	Promote kubelet graceful node shutdown to beta - Change the feature gate from alpha to beta and enable it by default - Update a few of the unit tests due to feature gate being enabled by default - Small refactor in `nodeshutdown_manager` which adds `featureEnabled` function (which checks that feature gate and that `kubeletConfig.ShutdownGracePeriod > 0`). - Use `featureEnabled()` to exit early from shutdown manager in the case that the feature is disabled - Update kubelet config defaulting to be explicit that `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to zero and update the godoc comments. - Update defaults and add featureGate tag in api config godoc. With this feature now in beta and the feature gate enabled by default, to enable graceful shutdown all that will be required is to configure `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the kubelet config. If not configured, they will be defaulted to zero, and graceful shutdown will effectively be disabled.	2021-03-05 15:21:37 -08:00
Benjamin Elder	56e092e382	hack/update-bazel.sh	2021-02-28 15:17:29 -08:00
Kubernetes Prow Robot	9ec1e23e41	Merge pull request #98005 from wzshiming/fix-rescheduling-to-the-shutdown-node Sync node status during kubelet node shutdown	2021-01-28 17:51:53 -08:00
Kubernetes Prow Robot	82ebcd1719	Merge pull request #98088 from wzshiming/fix-inhibit-lock Fix repeatedly aquire the inhibit lock	2021-01-22 00:37:26 -08:00
wzshiming	d9df265af0	Sync node status during kubelet node shutdown	2021-01-21 11:01:13 +08:00
Kubernetes Prow Robot	737858cd7c	Merge pull request #98200 from wzshiming/fix-node-shutdown-events Fix kubelet from panic after getting the wrong signal	2021-01-20 10:38:47 -08:00
wzshiming	4e17e58552	Fix repeatedly aquire the inhibit lock	2021-01-15 10:49:11 +08:00
wzshiming	0413529b62	Fix dbus shutdown events not continuing if they are not valid	2021-01-12 14:33:39 +08:00
wzshiming	0911b5ec79	remove executable permission bits	2021-01-12 13:32:23 +08:00

1 2

52 Commits