Commit Graph

140 Commits

Author SHA1 Message Date
Antonio Ojea
9c2b333925 Revert "plumb context from CRI calls through kubelet"
This reverts commit f43b4f1b95.
2022-11-02 13:37:23 +00:00
David Ashpole
f43b4f1b95
plumb context from CRI calls through kubelet 2022-10-28 02:55:28 +00:00
jinxu
0064010cdd Promote Local storage capacity isolation feature to GA
This change is to promote local storage capacity isolation feature to GA

At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.

Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
2022-08-02 23:45:48 -07:00
menglong.qi
b886b9b108 fix: typo 2021-11-17 09:22:57 +08:00
yuzhiquanlong
be9e1fda5e remove format pods func, instead with klog.Kobjs 2021-10-15 18:26:02 +08:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
Kubernetes Prow Robot
8b057cdfa4
Merge pull request #99095 from maxlaverse/fix_kubelet_stuck_in_diskpressure
Prevent Kubelet from getting stuck in DiskPressure when imagefs minReclaim is set
2021-04-23 18:23:14 -07:00
Maxime Lagresle
63cba062eb
more sensible comment 2021-04-12 21:20:20 +02:00
JunYang
af0b4c9031 Structured Logging migration: modify eviction part logs of kubelet.
Signed-off-by: JunYang <yang.jun22@zte.com.cn>
2021-02-21 08:36:14 +08:00
Maxime Lagresle
7e97601023
Prevent Kubelet stuck in DiskPressure when imagefs minReclaim is set
Fixes https://github.com/kubernetes/kubernetes/issues/99094
2021-02-15 14:31:06 +01:00
Mike Dame
578ff3ec34 Move Taint/Toleration helpers to component-helpers repo
This is part of the goal for scheduling to remove dependencies on internal
packages for the scheduling framework. It also provides these functions in an
external location for other components and projects to import.
2021-02-01 11:06:03 -05:00
wzshiming
29808eaf24 Fix panic 2021-01-21 19:47:28 +08:00
Derek Carr
acb43c7c4a Rework hostfs metrics
Ephemeral storage usage should be calculated by the metrics code,
not the eviction code.
2020-12-03 13:04:25 -07:00
Joel Smith
39a11744ce Partially revert "Include pod /etc/hosts in ephemeral storage calculation for eviction"
This reverts (most of) commit f34b586d01.
2020-12-03 04:47:16 -07:00
Marek Siarkowicz
7d309e0104 Move Kubelet Summary API to staging repo 2020-09-22 18:23:28 +02:00
Joel Smith
f34b586d01 Include pod /etc/hosts in ephemeral storage calculation for eviction 2020-07-08 12:58:11 -06:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Wei Fu
a809aaf03d eviction: use previous statsFunc
No need to use summary to create statsFunc for localStorageEviction.
Just use vals from makeSignalObservations.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-03-23 19:11:17 +08:00
Clayton Coleman
af9e0be163
kubelet: Record kubelet_evictions when limits are hit
The pod, container, and emptyDir volumes can all trigger evictions
when their limits are breached. To ensure that administrators can
alert on these type of evictions, update kubelet_evictions to include
the following signal types:

* ephemeralcontainerfs.limit - container ephemeral storage breaches its limit
* ephemeralpodfs.limit - pod ephemeral storage breaches its limit
* emptydirfs.limit - pod emptyDir storage breaches its limit
2020-02-18 15:08:30 -05:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
46472773cb
Merge pull request #84836 from yuxiaobo96/k8s-checks
Correct spelling mistakes
2019-11-06 12:21:11 -08:00
yuxiaobo
81e9f21f83 Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-11-06 20:25:19 +08:00
Wei Huang
dd74205bcf
Move out const strings in pkg/scheduler/api/well_known_labels.go 2019-11-05 20:56:21 -08:00
ianlang
372bf95a4f reject pods when under disk pressure 2019-10-27 23:27:00 +08:00
draveness
1163a1d51e feat: update taint nodes by condition to GA 2019-10-19 09:17:41 +08:00
Ted Yu
0939f90103 Check whether mirror pod is ciritical in managerImpl#evictPod 2019-10-01 11:12:18 -07:00
Tim Allclair
6510d26b6a Fix misc static check issues 2019-08-21 10:40:21 -07:00
Seth Jennings
23b69cf02d kubelet: add eviction counter to metrics 2019-08-13 15:21:38 -05:00
Himanshu Pandey
c05d506019 changed IsCriticalPod to return true in case of static pods 2019-08-07 15:47:43 -07:00
Pingan2017
e94d7b3802 clean up redundant conditiontype OutOfDisk 2019-07-03 14:34:52 +08:00
Ted Yu
19c91a59ab Iterate through thresholds in managerImpl#synchronize 2019-06-03 13:16:09 -07:00
Wei Huang
d67e7fd47f
kubelet: updated logic of verifying a static critical pod
- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info
2019-03-12 23:40:20 -07:00
danielqsj
9fd99a48f5 Change kubelet metrics to conform guideline 2019-02-18 14:01:58 +08:00
David Ashpole
8b440c6424 Fix PidPressure, make it evict by priority, and add fork-bomb node e2e test 2019-01-14 09:41:36 -08:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
30a06af453
Merge pull request #69671 from mooncak/fix_kubelet
Delete duplicated words in logs
2018-10-17 11:57:12 -07:00
tanshanshan
b7c7966b9f Move pkg/scheduler/algorithm/well_known_labels.go out 2018-10-13 09:10:00 +08:00
mooncake
1e6602d6d8 Fixup log
Signed-off-by: mooncake <xcoder@tenxcloud.com>
2018-10-11 19:14:36 +08:00
Christoph Blecker
97b2992dc1
Update gofmt for go1.11 2018-10-05 12:59:38 -07:00
Seth Jennings
f2a7654978 move feature gate checks inside IsCriticalPod 2018-07-11 16:10:05 -05:00
David Ashpole
b7deb6d9e0 fix eviction event formatting 2018-06-11 11:38:00 -07:00
David Ashpole
93b6d026d9 fix memcg fd leak 2018-06-11 11:37:50 -07:00
David Ashpole
fd1f19fc42 add metadata to kubelet eviction event annotations 2018-05-23 16:12:54 -07:00
Kubernetes Submit Queue
2accf11f1a
Merge pull request #57849 from dashpole/eviction_test_event
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Eviction Node e2e test checks for eviction reason

**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`.  However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.

**Release note**:
```release-note
NONE
```

cc @kubernetes/sig-node-pr-reviews
2018-05-17 00:28:19 -07:00
Kubernetes Submit Queue
c7bfc2a14e
Merge pull request #63220 from dashpole/fix_memcg_format
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix formatting for kubelet memcg notification threshold

/kind bug
**What this PR does / why we need it**:
This fixes the following errors (found in [this node_e2e serial test log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/4118/artifacts/tmp-node-e2e-49baaf8a-cos-stable-63-10032-71-0/kubelet.log)):
`eviction_manager.go:256] eviction manager attempting to integrate with kernel memcg notification api`
`threshold_notifier_linux.go:70] eviction: setting notification threshold to 4828488Ki`
`eviction_manager.go:272] eviction manager: failed to create hard memory threshold notifier: invalid argument`

**Special notes for your reviewer**:
This needs to be cherrypicked back to 1.10.
This regression was added in https://github.com/kubernetes/kubernetes/pull/60531, because the `quantity` being used was changed from a DecimalSI to BinarySI, which changes how it is printed out in the String() method.  To make it more explicit that we want the value, just convert Value() to a string.

**Release note**:
```release-note
Fix memory cgroup notifications, and reduce associated log spam.
```
2018-05-16 15:25:06 -07:00
Kubernetes Submit Queue
6934c4f599
Merge pull request #63521 from dashpole/allocatable_memcg
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add memcg notifications for allocatable cgroup

**What this PR does / why we need it**:
Use memory cgroup notifications to trigger the eviction manager when the allocatable eviction threshold is crossed.  This allows the eviction manager to respond more quickly when the allocatable cgroup's available memory becomes low.  Evictions are preferable to OOMs in the cgroup since the kubelet can enforce its priorities on which pod is killed.

**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57901

**Special notes for your reviewer**:
This adds the alloctable cgroup from the container manager to the eviction config.

**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind feature

I would like this to be included in the 1.11 release.
2018-05-15 19:55:15 -07:00
Kubernetes Submit Queue
d42df4561a
Merge pull request #61976 from atlassian/ticker-with-stop
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Stop() for Ticker to enable leak-free code

**What this PR does / why we need it**:
I wanted to use the clock package but the `Ticker` without a `Stop()` method is a deal breaker for me.

**Release note**:
```release-note
NONE
```
/kind enhancement
/sig api-machinery
2018-05-09 19:06:56 -07:00
Kubernetes Submit Queue
ba3176d94c
Merge pull request #58580 from k82cn/k8s_58505
Automatic merge from submit-queue (batch tested with PRs 58580, 63120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Admit BestEffort if it tolerates memory pressure.

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #58505 

**Release note**:
```release-note
None
```
2018-05-08 21:45:10 -07:00
David Ashpole
a5df208866 eviction test ensures failed pods are evicted 2018-05-08 16:08:35 -07:00