Commit Graph

855 Commits

Author SHA1 Message Date
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
haoyun
65ac99eef5 fix: npe in kubelet test
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2021-11-19 17:44:05 +08:00
ravisantoshgudimetla
696abecada [test][kubelet]: Fix out of bounds in TestSyncLabels unit 2021-11-10 16:53:59 -05:00
ravisantoshgudimetla
02c1bac0b6 [kubelet]: Sync label periodically 2021-11-05 18:47:43 -04:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Ryan Phillips
e2e938066d kubelet: add probe termination to graceful shutdowns 2021-09-22 14:13:25 -05:00
Kubernetes Prow Robot
047a6b9f86
Merge pull request #104874 from wojtek-t/migrate_clock_1
Unify towards k8s.io/utils/clock - part 1
2021-09-13 19:09:20 -07:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Clayton Coleman
17d32ed0b8
kubelet: Rejected pods should be filtered from admission
A pod that has been rejected by admission will have status manager
set the phase to Failed locally, which make take some time to
propagate to the apiserver. The rejected pod will be included in
admission until the apiserver propagates the change back, which
was an unintended regression when checking pod worker state as
authoritative.

A pod that is terminal in the API may still be consuming resources
on the system, so it should still be included in admission.
2021-09-08 10:23:45 -04:00
Kubernetes Prow Robot
8dbc33d649
Merge pull request #101081 from rphillips/add_graceful_shutdown_event
kubelet: add graceful shutdown events
2021-08-17 22:08:08 -07:00
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Ryan Phillips
d9be5abc37 kubelet: add shutdown events 2021-06-23 16:44:19 -05:00
Elana Hashman
cc2e9394be
kubelet: Fix test order in verifyContainerStatuses
Per https://pkg.go.dev/github.com/stretchr/testify/assert#Equal
expected goes before actual.
2021-06-04 16:04:10 -07:00
Ryan Phillips
224a4db269 cleanup podkiller close 2021-04-29 11:49:58 -05:00
Ryan Phillips
1f81b44cc7 kubelet: do not cleanup volumes if pod is being killed 2021-04-29 11:49:58 -05:00
fengzixu
edc1c62471 feature: add CSIVolumeHealth feature and gate
1. add EventRecorder to ResourceAnalyzer
2. add CSIVolumeHealth feature and gate
2021-03-10 01:16:37 +09:00
Kubernetes Prow Robot
c193c1b234
Merge pull request #98376 from matthyx/mega
Make all health checks probing consistent
2021-03-06 11:45:41 -08:00
Kubernetes Prow Robot
7125496e66
Merge pull request #99735 from bobbypage/beta-graceful-shutdown
Promote kubelet graceful node shutdown to beta
2021-03-05 17:23:42 -08:00
David Porter
893f5fd4f0 Promote kubelet graceful node shutdown to beta
- Change the feature gate from alpha to beta and enable it by default

- Update a few of the unit tests due to feature gate being enabled by
  default

- Small refactor in `nodeshutdown_manager` which adds `featureEnabled`
  function (which checks that feature gate and that
  `kubeletConfig.ShutdownGracePeriod > 0`).

- Use `featureEnabled()` to exit early from shutdown manager in the case
  that the feature is disabled

- Update kubelet config defaulting to be explicit that
  `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to
  zero and update the godoc comments.

- Update defaults and add featureGate tag in api config godoc.

With this feature now in beta and the feature gate enabled by default,
to enable graceful shutdown all that will be required is to configure
`ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the
kubelet config. If not configured, they will be defaulted to zero, and
graceful shutdown will effectively be disabled.
2021-03-05 15:21:37 -08:00
Matthias Bertschy
431e6a7044 Move readinessManager updates handling to kubelet 2021-03-05 07:02:25 +01:00
Geonju Kim
b4b7cea413 kubelet_test: Add TestHandlePodRemovesWhenSourcesAreReady 2021-02-26 06:34:27 +09:00
Geonju Kim
256447a349 kubelet_test: Fix TestHandlePodCleanups 2021-02-26 06:34:17 +09:00
Geonju Kim
fc4a29da2c kubelet: Make the test fail if (*FakeRuntime).Assert fails 2021-02-26 06:31:54 +09:00
Kubernetes Prow Robot
17c3ee8708
Merge pull request #98742 from gjkim42/sync-until-terminate-containers
kubelet: Sync completed pods until their containers have been terminated
2021-02-24 15:29:26 -08:00
Geonju Kim
5e752968c3 Simulate KillPod in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-14 07:17:48 +09:00
Geonju Kim
b451c15bf7 kubelet: Fix race when KillPod followed by IsPodPendingTermination
Ensures the pod to be pending termination or be killed, after
(*podKillerWithChannel).KillPod has been returned, by limiting
one request per pod in (*podKillerWithChannel).KillPod.
2021-02-14 07:16:49 +09:00
Ryan Phillips
ee8ea1b2c1 kubelet_test: fixes race in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-10 09:36:32 -06:00
Kubernetes Prow Robot
45d9a13b94
Merge pull request #96451 from ping035627/k8s-201112
Extract the const for ContainerStateReason
2021-02-09 10:25:00 -08:00
Geonju Kim
321ca8af52 kubelet: Sync completed pods until their containers have been terminated 2021-02-06 14:06:50 +09:00
Ryan Phillips
f918e11e3a register all pending pod deletions and check for kill
do not delete the cgroup from a pod when it is being killed
2021-02-04 11:45:42 -06:00
PingWang
4103ff490f Extract the const for ContainerStateReason
Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update fmt

Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update test

Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2021-02-04 08:52:13 +08:00
Kubernetes Prow Robot
4e93dbcd0d
Merge pull request #94087 from derekwaynecarr/node-sync-once
kubelet waits for node lister to sync at least once
2021-01-12 15:06:35 -08:00
Derek Carr
acb43c7c4a Rework hostfs metrics
Ephemeral storage usage should be calculated by the metrics code,
not the eviction code.
2020-12-03 13:04:25 -07:00
Joel Smith
39a11744ce Partially revert "Include pod /etc/hosts in ephemeral storage calculation for eviction"
This reverts (most of) commit f34b586d01.
2020-12-03 04:47:16 -07:00
Kubernetes Prow Robot
d21815cb4e
Merge pull request #95569 from oomichi/remove-kubelet-util
Move dirExists() to kubelet_test
2020-11-06 11:28:51 -08:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
David Eads
ff7d1444f0 kubelet container status calculation doesn't handle suddenly missing data properly 2020-10-15 12:26:16 -04:00
Kenichi Omichi
c0795782e0 Move dirExists() to kubelet_test
dirExists() is called from kubelet_test only.
This moves the function to kubelet_test for cleanup.
2020-10-14 17:50:00 +00:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
David Dymko
cda0070f28 fix golint for pkg/volume/azure_dd 2020-09-05 09:44:27 -04:00
Derek Carr
752135242e WIP: node sync at least once 2020-09-01 12:17:26 -04:00
Kubernetes Prow Robot
fe1aeff2d2
Merge pull request #92013 from MHBauer/dockershim-error-test
basic regression test of runDockershim
2020-08-27 17:55:55 -07:00
Jordan Liggitt
b181c76cbd Deflake TestUpdateNodeStatusWithLease - guard cached machineInfo 2020-08-05 10:00:36 -04:00
Jordan Liggitt
124a5ddf72 Fix int->string casts 2020-07-24 16:23:12 -04:00
Kubernetes Prow Robot
8398bc3b53
Merge pull request #92916 from joelsmith/count-etc-hosts
Include pod /etc/hosts in ephemeral storage calculation for eviction
2020-07-12 06:59:36 -07:00
Kubernetes Prow Robot
93e76f5081
Merge pull request #92442 from tedyu/grace-period-with-map
Respect grace period when removing mirror pod
2020-07-10 17:49:23 -07:00
Kubernetes Prow Robot
1e3eeba9fa
Merge pull request #91577 from knabben/kubelet-bootstrap
kubelet: remove the --bootstrap-checkpoint-path feature
2020-07-09 00:03:41 -07:00
Ted Yu
a76a959294 Respect grace period when removing mirror pod
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-07-08 13:38:24 -07:00
Joel Smith
f34b586d01 Include pod /etc/hosts in ephemeral storage calculation for eviction 2020-07-08 12:58:11 -06:00
Sergey Kanzhelev
ee53488f19 fix golint issues in pkg/kubelet/container 2020-06-19 15:48:08 +00:00