Commit Graph

844 Commits

Author SHA1 Message Date
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Elana Hashman
cc2e9394be
kubelet: Fix test order in verifyContainerStatuses
Per https://pkg.go.dev/github.com/stretchr/testify/assert#Equal
expected goes before actual.
2021-06-04 16:04:10 -07:00
Ryan Phillips
224a4db269 cleanup podkiller close 2021-04-29 11:49:58 -05:00
Ryan Phillips
1f81b44cc7 kubelet: do not cleanup volumes if pod is being killed 2021-04-29 11:49:58 -05:00
fengzixu
edc1c62471 feature: add CSIVolumeHealth feature and gate
1. add EventRecorder to ResourceAnalyzer
2. add CSIVolumeHealth feature and gate
2021-03-10 01:16:37 +09:00
Kubernetes Prow Robot
c193c1b234
Merge pull request #98376 from matthyx/mega
Make all health checks probing consistent
2021-03-06 11:45:41 -08:00
Kubernetes Prow Robot
7125496e66
Merge pull request #99735 from bobbypage/beta-graceful-shutdown
Promote kubelet graceful node shutdown to beta
2021-03-05 17:23:42 -08:00
David Porter
893f5fd4f0 Promote kubelet graceful node shutdown to beta
- Change the feature gate from alpha to beta and enable it by default

- Update a few of the unit tests due to feature gate being enabled by
  default

- Small refactor in `nodeshutdown_manager` which adds `featureEnabled`
  function (which checks that feature gate and that
  `kubeletConfig.ShutdownGracePeriod > 0`).

- Use `featureEnabled()` to exit early from shutdown manager in the case
  that the feature is disabled

- Update kubelet config defaulting to be explicit that
  `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to
  zero and update the godoc comments.

- Update defaults and add featureGate tag in api config godoc.

With this feature now in beta and the feature gate enabled by default,
to enable graceful shutdown all that will be required is to configure
`ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the
kubelet config. If not configured, they will be defaulted to zero, and
graceful shutdown will effectively be disabled.
2021-03-05 15:21:37 -08:00
Matthias Bertschy
431e6a7044 Move readinessManager updates handling to kubelet 2021-03-05 07:02:25 +01:00
Geonju Kim
b4b7cea413 kubelet_test: Add TestHandlePodRemovesWhenSourcesAreReady 2021-02-26 06:34:27 +09:00
Geonju Kim
256447a349 kubelet_test: Fix TestHandlePodCleanups 2021-02-26 06:34:17 +09:00
Geonju Kim
fc4a29da2c kubelet: Make the test fail if (*FakeRuntime).Assert fails 2021-02-26 06:31:54 +09:00
Kubernetes Prow Robot
17c3ee8708
Merge pull request #98742 from gjkim42/sync-until-terminate-containers
kubelet: Sync completed pods until their containers have been terminated
2021-02-24 15:29:26 -08:00
Geonju Kim
5e752968c3 Simulate KillPod in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-14 07:17:48 +09:00
Geonju Kim
b451c15bf7 kubelet: Fix race when KillPod followed by IsPodPendingTermination
Ensures the pod to be pending termination or be killed, after
(*podKillerWithChannel).KillPod has been returned, by limiting
one request per pod in (*podKillerWithChannel).KillPod.
2021-02-14 07:16:49 +09:00
Ryan Phillips
ee8ea1b2c1 kubelet_test: fixes race in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-10 09:36:32 -06:00
Kubernetes Prow Robot
45d9a13b94
Merge pull request #96451 from ping035627/k8s-201112
Extract the const for ContainerStateReason
2021-02-09 10:25:00 -08:00
Geonju Kim
321ca8af52 kubelet: Sync completed pods until their containers have been terminated 2021-02-06 14:06:50 +09:00
Ryan Phillips
f918e11e3a register all pending pod deletions and check for kill
do not delete the cgroup from a pod when it is being killed
2021-02-04 11:45:42 -06:00
PingWang
4103ff490f Extract the const for ContainerStateReason
Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update fmt

Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update test

Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2021-02-04 08:52:13 +08:00
Kubernetes Prow Robot
4e93dbcd0d
Merge pull request #94087 from derekwaynecarr/node-sync-once
kubelet waits for node lister to sync at least once
2021-01-12 15:06:35 -08:00
Derek Carr
acb43c7c4a Rework hostfs metrics
Ephemeral storage usage should be calculated by the metrics code,
not the eviction code.
2020-12-03 13:04:25 -07:00
Joel Smith
39a11744ce Partially revert "Include pod /etc/hosts in ephemeral storage calculation for eviction"
This reverts (most of) commit f34b586d01.
2020-12-03 04:47:16 -07:00
Kubernetes Prow Robot
d21815cb4e
Merge pull request #95569 from oomichi/remove-kubelet-util
Move dirExists() to kubelet_test
2020-11-06 11:28:51 -08:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
David Eads
ff7d1444f0 kubelet container status calculation doesn't handle suddenly missing data properly 2020-10-15 12:26:16 -04:00
Kenichi Omichi
c0795782e0 Move dirExists() to kubelet_test
dirExists() is called from kubelet_test only.
This moves the function to kubelet_test for cleanup.
2020-10-14 17:50:00 +00:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
David Dymko
cda0070f28 fix golint for pkg/volume/azure_dd 2020-09-05 09:44:27 -04:00
Derek Carr
752135242e WIP: node sync at least once 2020-09-01 12:17:26 -04:00
Kubernetes Prow Robot
fe1aeff2d2
Merge pull request #92013 from MHBauer/dockershim-error-test
basic regression test of runDockershim
2020-08-27 17:55:55 -07:00
Jordan Liggitt
b181c76cbd Deflake TestUpdateNodeStatusWithLease - guard cached machineInfo 2020-08-05 10:00:36 -04:00
Jordan Liggitt
124a5ddf72 Fix int->string casts 2020-07-24 16:23:12 -04:00
Kubernetes Prow Robot
8398bc3b53
Merge pull request #92916 from joelsmith/count-etc-hosts
Include pod /etc/hosts in ephemeral storage calculation for eviction
2020-07-12 06:59:36 -07:00
Kubernetes Prow Robot
93e76f5081
Merge pull request #92442 from tedyu/grace-period-with-map
Respect grace period when removing mirror pod
2020-07-10 17:49:23 -07:00
Kubernetes Prow Robot
1e3eeba9fa
Merge pull request #91577 from knabben/kubelet-bootstrap
kubelet: remove the --bootstrap-checkpoint-path feature
2020-07-09 00:03:41 -07:00
Ted Yu
a76a959294 Respect grace period when removing mirror pod
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-07-08 13:38:24 -07:00
Joel Smith
f34b586d01 Include pod /etc/hosts in ephemeral storage calculation for eviction 2020-07-08 12:58:11 -06:00
Sergey Kanzhelev
ee53488f19 fix golint issues in pkg/kubelet/container 2020-06-19 15:48:08 +00:00
Morgan Bauer
cb4b67a886
basic regression test of runDockershim
- added basic regression test to ensure an error is raised in the
   case of an unconfigured runtime, and the case of asking for a docker
   runtime when compiled dockerless

Signed-off-by: Morgan Bauer <mbauer@us.ibm.com>
2020-06-16 10:08:22 -07:00
Amim Knabben
0ed41c3f10 Deprecating --bootstrap-checkpoint-path flag 2020-06-09 15:27:01 -04:00
David Eads
4da0e64bc1 reduce race risk in kubelet for missing KUBERNETES_SERVICE_HOST 2020-05-29 17:11:19 -04:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
louisgong
7f5076d8ee slim down some lister expansions 2019-12-07 08:27:06 +08:00
Travis Rhoden
0c5c3d8bb9
Remove pkg/util/mount (moved out of tree)
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
2019-11-15 08:29:12 -07:00
Kubernetes Prow Robot
a08b09d52f
Merge pull request #84279 from matthyx/kuberuntime-startupprobe
Add startupProbe result handling to kuberuntime
2019-11-13 13:01:53 -08:00
Matthias Bertschy
66595d54a0 Add startupProbe result handling to kuberuntime 2019-11-13 08:12:54 +01:00
Kubernetes Prow Robot
897ce3073c
Merge pull request #84533 from davidz627/fix/deprecatedPath
Remove plugin watching of deprecated directory and CSI v0 support in accordance with deprecation policy
2019-11-12 04:48:20 -08:00
David Zhu
802fe12803 Remove plugin watching of deprecated directory {kubelet_root_dir}/plugins and support for CSI V0 in accordance with deprecation announcement in https://v1-13.docs.kubernetes.io/docs/setup/release/notes/ 2019-11-11 11:42:58 -08:00