kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	a6c2cd7d18	Merge pull request #103291 from wzshiming/fix/nodeshutdown-restart Fix Data Race in nodeshutdown restart	2021-07-09 08:43:14 -07:00
Clayton Coleman	3eadd1a9ea	Keep pod worker running until pod is truly complete A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state. Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}). Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components simply delegate to checking whether they are in the window by UID. Removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup. Node shutdown no longer marks pods as failed as they can be restarted in the next step. See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details	2021-07-06 15:55:22 -04:00
Shiming Zhang	212ce7c287	Shorten test time	2021-06-30 09:48:26 +08:00
Shiming Zhang	a42c066af7	Fix Data Race in nodeshutdown restart	2021-06-29 16:23:45 +08:00
Kubernetes Prow Robot	62fdaabe82	Merge pull request #102635 from charlesxsh/fix-linux-test fix a potential deadlock in graceful node shutdown unit tests	2021-06-21 16:27:45 -07:00
Shihao Xia	a2a4b50bc1	fixed deadlock	2021-06-03 18:03:17 -04:00
Shiming Zhang	202a012093	Add restart unit test	2021-05-23 00:47:36 +08:00
David Porter	893f5fd4f0	Promote kubelet graceful node shutdown to beta - Change the feature gate from alpha to beta and enable it by default - Update a few of the unit tests due to feature gate being enabled by default - Small refactor in `nodeshutdown_manager` which adds `featureEnabled` function (which checks that feature gate and that `kubeletConfig.ShutdownGracePeriod > 0`). - Use `featureEnabled()` to exit early from shutdown manager in the case that the feature is disabled - Update kubelet config defaulting to be explicit that `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to zero and update the godoc comments. - Update defaults and add featureGate tag in api config godoc. With this feature now in beta and the feature gate enabled by default, to enable graceful shutdown all that will be required is to configure `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the kubelet config. If not configured, they will be defaulted to zero, and graceful shutdown will effectively be disabled.	2021-03-05 15:21:37 -08:00
wzshiming	d9df265af0	Sync node status during kubelet node shutdown	2021-01-21 11:01:13 +08:00
David Porter	16f71c6d47	Implement shutdown manager in kubelet Implements KEP 2000, Graceful Node Shutdown: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown * Add new FeatureGate `GracefulNodeShutdown` to control enabling/disabling the feature * Add two new KubeletConfiguration options * `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` * Add new package, `nodeshutdown` that implements the Node shutdown manager * The node shutdown manager uses the systemd inhibit package, to create an system inhibitor, monitor for node shutdown events, and gracefully terminate pods upon a node shutdown.	2020-11-12 21:47:55 +00:00

10 Commits