Commit Graph

865 Commits

Author SHA1 Message Date
Patrick Ohly
65385fec20 kubelet: convert node shutdown manager to contextual logging
This will make output checking easier (done in a separate commit). kubelet
itself still uses the global logger.
2022-06-24 11:20:34 +02:00
Davanum Srinivas
50bea1dad8
Move from k8s.gcr.io to registry.k8s.io
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-05-31 10:16:53 -04:00
AllenZMC
bedd0839a1 Optimize test cases for kubelet 2022-05-05 23:07:09 +08:00
Clayton Coleman
69a3820214
kubelet: Delay writing a terminal phase until the pod is terminated
Other components must know when the Kubelet has released critical
resources for terminal pods. Do not set the phase in the apiserver
to terminal until all containers are stopped and cannot restart.

As a consequence of this change, the Kubelet must explicitly transition
a terminal pod to the terminating state in the pod worker which is
handled by returning a new isTerminal boolean from syncPod.

Finally, if a pod with init containers hasn't been initialized yet,
don't default container statuses or not yet attempted init containers
to the unknown failure state.
2022-03-16 13:15:00 -04:00
Kubernetes Prow Robot
06e107081e
Merge pull request #104732 from mengjiao-liu/remove-flag-experimental-check-node-capabilities-before-mount
kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`
2022-02-24 07:56:30 -08:00
Ciprian Hacman
0819451ea6 Clean up logic for deprecated flag --container-runtime in kubelet
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-02-10 13:26:59 +02:00
Ciprian Hacman
21809043b5 Remove deprecated flag --non-masquerade-cidr in kubelet
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-01-19 09:17:26 +02:00
Mengjiao Liu
beda4cafb6 kubelet: Remove the deprecated flag --experimental-check-node-capabilities-before-mount 2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot
f0dbc32ed9
Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse
When volume is not marked in-use, do not backoff
2021-12-22 19:46:37 -08:00
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
Sergey Kanzhelev
a11453efbc remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior 2021-11-29 20:00:10 +00:00
haoyun
65ac99eef5 fix: npe in kubelet test
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2021-11-19 17:44:05 +08:00
ravisantoshgudimetla
696abecada [test][kubelet]: Fix out of bounds in TestSyncLabels unit 2021-11-10 16:53:59 -05:00
ravisantoshgudimetla
02c1bac0b6 [kubelet]: Sync label periodically 2021-11-05 18:47:43 -04:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Ryan Phillips
e2e938066d kubelet: add probe termination to graceful shutdowns 2021-09-22 14:13:25 -05:00
Kubernetes Prow Robot
047a6b9f86
Merge pull request #104874 from wojtek-t/migrate_clock_1
Unify towards k8s.io/utils/clock - part 1
2021-09-13 19:09:20 -07:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Clayton Coleman
17d32ed0b8
kubelet: Rejected pods should be filtered from admission
A pod that has been rejected by admission will have status manager
set the phase to Failed locally, which make take some time to
propagate to the apiserver. The rejected pod will be included in
admission until the apiserver propagates the change back, which
was an unintended regression when checking pod worker state as
authoritative.

A pod that is terminal in the API may still be consuming resources
on the system, so it should still be included in admission.
2021-09-08 10:23:45 -04:00
Kubernetes Prow Robot
8dbc33d649
Merge pull request #101081 from rphillips/add_graceful_shutdown_event
kubelet: add graceful shutdown events
2021-08-17 22:08:08 -07:00
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Ryan Phillips
d9be5abc37 kubelet: add shutdown events 2021-06-23 16:44:19 -05:00
Elana Hashman
cc2e9394be
kubelet: Fix test order in verifyContainerStatuses
Per https://pkg.go.dev/github.com/stretchr/testify/assert#Equal
expected goes before actual.
2021-06-04 16:04:10 -07:00
Ryan Phillips
224a4db269 cleanup podkiller close 2021-04-29 11:49:58 -05:00
Ryan Phillips
1f81b44cc7 kubelet: do not cleanup volumes if pod is being killed 2021-04-29 11:49:58 -05:00
fengzixu
edc1c62471 feature: add CSIVolumeHealth feature and gate
1. add EventRecorder to ResourceAnalyzer
2. add CSIVolumeHealth feature and gate
2021-03-10 01:16:37 +09:00
Kubernetes Prow Robot
c193c1b234
Merge pull request #98376 from matthyx/mega
Make all health checks probing consistent
2021-03-06 11:45:41 -08:00
Kubernetes Prow Robot
7125496e66
Merge pull request #99735 from bobbypage/beta-graceful-shutdown
Promote kubelet graceful node shutdown to beta
2021-03-05 17:23:42 -08:00
David Porter
893f5fd4f0 Promote kubelet graceful node shutdown to beta
- Change the feature gate from alpha to beta and enable it by default

- Update a few of the unit tests due to feature gate being enabled by
  default

- Small refactor in `nodeshutdown_manager` which adds `featureEnabled`
  function (which checks that feature gate and that
  `kubeletConfig.ShutdownGracePeriod > 0`).

- Use `featureEnabled()` to exit early from shutdown manager in the case
  that the feature is disabled

- Update kubelet config defaulting to be explicit that
  `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` default to
  zero and update the godoc comments.

- Update defaults and add featureGate tag in api config godoc.

With this feature now in beta and the feature gate enabled by default,
to enable graceful shutdown all that will be required is to configure
`ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` in the
kubelet config. If not configured, they will be defaulted to zero, and
graceful shutdown will effectively be disabled.
2021-03-05 15:21:37 -08:00
Matthias Bertschy
431e6a7044 Move readinessManager updates handling to kubelet 2021-03-05 07:02:25 +01:00
Geonju Kim
b4b7cea413 kubelet_test: Add TestHandlePodRemovesWhenSourcesAreReady 2021-02-26 06:34:27 +09:00
Geonju Kim
256447a349 kubelet_test: Fix TestHandlePodCleanups 2021-02-26 06:34:17 +09:00
Geonju Kim
fc4a29da2c kubelet: Make the test fail if (*FakeRuntime).Assert fails 2021-02-26 06:31:54 +09:00
Kubernetes Prow Robot
17c3ee8708
Merge pull request #98742 from gjkim42/sync-until-terminate-containers
kubelet: Sync completed pods until their containers have been terminated
2021-02-24 15:29:26 -08:00
Geonju Kim
5e752968c3 Simulate KillPod in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-14 07:17:48 +09:00
Geonju Kim
b451c15bf7 kubelet: Fix race when KillPod followed by IsPodPendingTermination
Ensures the pod to be pending termination or be killed, after
(*podKillerWithChannel).KillPod has been returned, by limiting
one request per pod in (*podKillerWithChannel).KillPod.
2021-02-14 07:16:49 +09:00
Ryan Phillips
ee8ea1b2c1 kubelet_test: fixes race in TestSyncPodsDeletesWhenSourcesAreReadyPerQOS 2021-02-10 09:36:32 -06:00
Kubernetes Prow Robot
45d9a13b94
Merge pull request #96451 from ping035627/k8s-201112
Extract the const for ContainerStateReason
2021-02-09 10:25:00 -08:00
Geonju Kim
321ca8af52 kubelet: Sync completed pods until their containers have been terminated 2021-02-06 14:06:50 +09:00
Ryan Phillips
f918e11e3a register all pending pod deletions and check for kill
do not delete the cgroup from a pod when it is being killed
2021-02-04 11:45:42 -06:00
PingWang
4103ff490f Extract the const for ContainerStateReason
Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update fmt

Signed-off-by: PingWang <wang.ping5@zte.com.cn>

update test

Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2021-02-04 08:52:13 +08:00
Kubernetes Prow Robot
4e93dbcd0d
Merge pull request #94087 from derekwaynecarr/node-sync-once
kubelet waits for node lister to sync at least once
2021-01-12 15:06:35 -08:00
Derek Carr
acb43c7c4a Rework hostfs metrics
Ephemeral storage usage should be calculated by the metrics code,
not the eviction code.
2020-12-03 13:04:25 -07:00
Joel Smith
39a11744ce Partially revert "Include pod /etc/hosts in ephemeral storage calculation for eviction"
This reverts (most of) commit f34b586d01.
2020-12-03 04:47:16 -07:00
Kubernetes Prow Robot
d21815cb4e
Merge pull request #95569 from oomichi/remove-kubelet-util
Move dirExists() to kubelet_test
2020-11-06 11:28:51 -08:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
David Eads
ff7d1444f0 kubelet container status calculation doesn't handle suddenly missing data properly 2020-10-15 12:26:16 -04:00
Kenichi Omichi
c0795782e0 Move dirExists() to kubelet_test
dirExists() is called from kubelet_test only.
This moves the function to kubelet_test for cleanup.
2020-10-14 17:50:00 +00:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
David Dymko
cda0070f28 fix golint for pkg/volume/azure_dd 2020-09-05 09:44:27 -04:00