Commit Graph

314 Commits

Author SHA1 Message Date
Derek Carr
acb43c7c4a Rework hostfs metrics
Ephemeral storage usage should be calculated by the metrics code,
not the eviction code.
2020-12-03 13:04:25 -07:00
Joel Smith
39a11744ce Partially revert "Include pod /etc/hosts in ephemeral storage calculation for eviction"
This reverts (most of) commit f34b586d01.
2020-12-03 04:47:16 -07:00
Masashi Honma
4c12900643 kube-eviction: Fix SI of process quantity
Use DecimalSI instead of BinarySI because process count is decimal.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
2020-10-13 18:24:43 +09:00
Jan Chaloupka
274c536da3 Removing GetPodPriority from pkg/api and importing PodPriority from k8s.io/component-helpers 2020-10-11 21:40:11 +02:00
Marek Siarkowicz
7d309e0104 Move Kubelet Summary API to staging repo 2020-09-22 18:23:28 +02:00
Seth Jennings
a4f043a980 kubelet: eviction: remove noise from TestGetReclaimableThreshold test output 2020-07-27 13:53:55 -05:00
Joel Smith
f34b586d01 Include pod /etc/hosts in ephemeral storage calculation for eviction 2020-07-08 12:58:11 -06:00
Kubernetes Prow Robot
86ad0df820
Merge pull request #92203 from sjenning/add-sjenning-node-approver
Add sjenning as kubelet approver
2020-06-19 21:52:02 -07:00
Kubernetes Prow Robot
86ab25f038
Merge pull request #91716 from kadisi/append_mutations_kubelet
fix unexpected append mutations about pkg/kubelet package
2020-06-19 21:51:08 -07:00
Seth Jennings
45d2b98aa8 add sjenning as kubelet approver 2020-06-19 13:00:55 -05:00
Kubernetes Prow Robot
677e8d6871
Merge pull request #86223 from dashpole/owners_changes
Add dashpole as kubelet approver
2020-06-18 22:59:58 -07:00
kadisi
a75323c76b fix unexpected append mutations about pkg/kubelet package
Signed-off-by: kadisi <iamkadisi@163.com>
Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2020-06-03 13:36:57 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
89dfebb214
Merge pull request #89359 from gongguan/process
eviction by process number
2020-03-24 15:27:25 -07:00
louisgong
0efb70c0a2 eviction by process number 2020-03-24 09:25:04 +08:00
Wei Fu
a809aaf03d eviction: use previous statsFunc
No need to use summary to create statsFunc for localStorageEviction.
Just use vals from makeSignalObservations.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-03-23 19:11:17 +08:00
Clayton Coleman
af9e0be163
kubelet: Record kubelet_evictions when limits are hit
The pod, container, and emptyDir volumes can all trigger evictions
when their limits are breached. To ensure that administrators can
alert on these type of evictions, update kubelet_evictions to include
the following signal types:

* ephemeralcontainerfs.limit - container ephemeral storage breaches its limit
* ephemeralpodfs.limit - pod ephemeral storage breaches its limit
* emptydirfs.limit - pod emptyDir storage breaches its limit
2020-02-18 15:08:30 -05:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
ianlang
c9418412d1 fix misspelling in comment 2019-12-16 17:27:08 +08:00
David Ashpole
fca84c02bb add dashpole as kubelet approver 2019-12-12 11:10:24 -08:00
Jordan Liggitt
297570e06a hack/update-vendor.sh 2019-11-06 17:42:34 -05:00
Kubernetes Prow Robot
46472773cb
Merge pull request #84836 from yuxiaobo96/k8s-checks
Correct spelling mistakes
2019-11-06 12:21:11 -08:00
yuxiaobo
81e9f21f83 Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-11-06 20:25:19 +08:00
Wei Huang
019d7497a5
bazel files 2019-11-05 20:57:21 -08:00
Wei Huang
dd74205bcf
Move out const strings in pkg/scheduler/api/well_known_labels.go 2019-11-05 20:56:21 -08:00
ianlang
22d8e054bc unit test: TestAdmitUnderNodeConditions 2019-10-28 11:37:18 +08:00
ianlang
372bf95a4f reject pods when under disk pressure 2019-10-27 23:27:00 +08:00
draveness
1163a1d51e feat: update taint nodes by condition to GA 2019-10-19 09:17:41 +08:00
Ted Yu
0939f90103 Check whether mirror pod is ciritical in managerImpl#evictPod 2019-10-01 11:12:18 -07:00
Harsh Singh
6a9ef7f04f Move GetPodPriority from /scheduler/util to /api/pod 2019-09-24 22:02:13 +05:30
Kubernetes Prow Robot
a3488b4cee
Merge pull request #81206 from tallclair/staticcheck-kubelet-push
Cleanup Kubelet static analysis issues
2019-08-22 15:09:43 -07:00
Kubernetes Prow Robot
6b47754740
Merge pull request #81627 from tallclair/copy
Delete duplicate resource.Quantity.Copy()
2019-08-22 11:13:13 -07:00
Tim Allclair
a2c51674cf Cleanup more static check issues (S1*,ST*) 2019-08-21 10:40:21 -07:00
Tim Allclair
6510d26b6a Fix misc static check issues 2019-08-21 10:40:21 -07:00
Kubernetes Prow Robot
8cf05f514c
Merge pull request #79247 from egernst/kubelet-PodOverhead
Kubelet enabling to support pod-overhead
2019-08-20 13:27:15 -07:00
Tim Allclair
49f50484b8 Delete duplicate resource.Quantity.Copy() 2019-08-19 17:23:14 -07:00
Kubernetes Prow Robot
273e9262bb
Merge pull request #80342 from draveness/feature/remove-critical-pod-annotation
feat: cleanup pod critical pod annotations feature
2019-08-15 07:20:34 -07:00
Eric Ernst
476c1c7a2b kube-eviction: use common resource summation functions
Utilize resource helpers' GetResourceRequestQuantity instead of
duplicating the logic here.

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-08-13 16:23:28 -07:00
Seth Jennings
23b69cf02d kubelet: add eviction counter to metrics 2019-08-13 15:21:38 -05:00
draveness
495faa22db feat: cleanup pod critical pod annotations feature 2019-08-09 08:41:23 +08:00
Himanshu Pandey
c05d506019 changed IsCriticalPod to return true in case of static pods 2019-08-07 15:47:43 -07:00
draveness
d83526d253 Revert "feat: cleanup pod critical pod annotations feature"
This reverts commit b6d41ee5cc.
2019-07-18 13:31:12 +08:00
Kubernetes Prow Robot
642a06e552
Merge pull request #79554 from draveness/feature/remove-critical-pod-annotation
feat: cleanup pod critical pod annotations feature
2019-07-11 22:03:04 -07:00
draveness
b6d41ee5cc feat: cleanup pod critical pod annotations feature 2019-07-11 08:54:19 +08:00
Brian Goff
45b0261290 Use EPOLL/O_CLOEXEC in evicition notifier
This prevents fd's from leaking to subprocesses.
2019-07-09 10:03:31 -07:00
Pingan2017
e94d7b3802 clean up redundant conditiontype OutOfDisk 2019-07-03 14:34:52 +08:00
Kubernetes Prow Robot
c64f81d082
Merge pull request #78653 from sjenning/add-sjenning-owners
kubelet: add sjenning to kubelet subdirectory owners files
2019-06-25 14:47:15 -07:00
draveness
ca6003bc75 feat: cleanup PodPriority features gate 2019-06-23 11:57:24 +08:00
Kubernetes Prow Robot
145232c1a0
Merge pull request #78673 from tedyu/threshold-min-reclaim
Remove inner loop for finding MinReclaim in ParseThresholdConfig
2019-06-14 13:27:02 -07:00
Kubernetes Prow Robot
3fc21aff76
Merge pull request #78624 from tedyu/evict-mgr-threshold
Iterate through thresholds in managerImpl#synchronize
2019-06-14 07:59:05 -07:00
Ted Yu
f7d9e037d9 Remove inner loop for finding MinReclaim in ParseThresholdConfig 2019-06-03 19:20:19 -07:00
Ted Yu
19c91a59ab Iterate through thresholds in managerImpl#synchronize 2019-06-03 13:16:09 -07:00
Seth Jennings
89dc2c65e4 kubelet: add sjenning to kubelet subdirectory owners files 2019-06-03 08:26:24 -05:00
Kubernetes Prow Robot
162912e12a
Merge pull request #78496 from dashpole/dashpole_owners
Add dashpole to kubelet subdirectory owners files
2019-06-01 02:55:07 -07:00
David Ashpole
a95cf017e1 add dashpole to kubelet owners files 2019-05-29 13:33:48 -07:00
Robert Krawitz
f8661d6240 Use xfs_quota command to apply quotas 2019-05-29 15:12:28 -04:00
Robert Krawitz
448e0c44c6 Apply quotas via syscalls using cgo. 2019-05-29 15:12:28 -04:00
Andrew Kim
c919139245 update import of generic featuregate code from k8s.io/apiserver/pkg/util/feature -> k8s.io/component-base/featuregate 2019-05-08 10:01:50 -04:00
Davanum Srinivas
7b8c9acc09
remove unused code
Change-Id: If821920ec8872e326b7d85437ad8d2620807799d
2019-04-19 08:36:31 -04:00
Wei Huang
d67e7fd47f
kubelet: updated logic of verifying a static critical pod
- check if a pod is static by its static pod info
- meanwhile, check if a pod is critical by its corresponding mirror pod info
2019-03-12 23:40:20 -07:00
danielqsj
9fd99a48f5 Change kubelet metrics to conform guideline 2019-02-18 14:01:58 +08:00
Roy Lenferink
b43c04452f Updated OWNERS files to include link to docs 2019-02-04 22:33:12 +01:00
Kubernetes Prow Robot
c0457488b6
Merge pull request #63901 from weipeng1213/branch-3
fix typo: writeable->writable
2019-02-01 07:44:26 -08:00
David Ashpole
8b440c6424 Fix PidPressure, make it evict by priority, and add fork-bomb node e2e test 2019-01-14 09:41:36 -08:00
Kubernetes Prow Robot
33a37702a6
Merge pull request #64280 from dashpole/eviction_pod_metrics
Use memory metrics from the pod cgroup for eviction ranking
2018-12-04 08:26:03 -08:00
Jordan Liggitt
2498ca7606 drop VerifyFeatureGatesUnchanged 2018-11-21 11:51:33 -05:00
Jordan Liggitt
4dca07ef7e Fixup incorrect use of DefaultFeatureGate.Set in tests 2018-11-21 11:51:33 -05:00
Jordan Liggitt
733dd9dfd7 Add tests to ensure feature gate changes don't escape kubelet/scheduler packages 2018-11-16 10:52:53 -05:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
30a06af453
Merge pull request #69671 from mooncak/fix_kubelet
Delete duplicated words in logs
2018-10-17 11:57:12 -07:00
tanshanshan
b7c7966b9f Move pkg/scheduler/algorithm/well_known_labels.go out 2018-10-13 09:10:00 +08:00
mooncake
1e6602d6d8 Fixup log
Signed-off-by: mooncake <xcoder@tenxcloud.com>
2018-10-11 19:14:36 +08:00
Christoph Blecker
97b2992dc1
Update gofmt for go1.11 2018-10-05 12:59:38 -07:00
Krzysztof Jastrzebski
138a3c7172 Add "only_cpu_and_memory" GET parameter to /stats/summary http handler in kubelet. If parameter is true then only cpu and memory will be present in response. The parameter will be used by Metric Server to avoid sending/decoding unneeded data. 2018-09-06 21:49:00 +02:00
Seth Jennings
f2a7654978 move feature gate checks inside IsCriticalPod 2018-07-11 16:10:05 -05:00
Jeff Grafton
23ceebac22 Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
Jeff Grafton
a725660640 Update to gazelle 0.12.0 and run hack/update-bazel.sh 2018-06-22 16:22:18 -07:00
David Ashpole
b7deb6d9e0 fix eviction event formatting 2018-06-11 11:38:00 -07:00
David Ashpole
93b6d026d9 fix memcg fd leak 2018-06-11 11:37:50 -07:00
David Ashpole
2be67e7dde use memory metrics from the pod cgroup for eviction ranking 2018-05-24 10:59:53 -07:00
David Ashpole
fd1f19fc42 add metadata to kubelet eviction event annotations 2018-05-23 16:12:54 -07:00
Kubernetes Submit Queue
2accf11f1a
Merge pull request #57849 from dashpole/eviction_test_event
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Eviction Node e2e test checks for eviction reason

**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`.  However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.

**Release note**:
```release-note
NONE
```

cc @kubernetes/sig-node-pr-reviews
2018-05-17 00:28:19 -07:00
Kubernetes Submit Queue
c7bfc2a14e
Merge pull request #63220 from dashpole/fix_memcg_format
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix formatting for kubelet memcg notification threshold

/kind bug
**What this PR does / why we need it**:
This fixes the following errors (found in [this node_e2e serial test log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/4118/artifacts/tmp-node-e2e-49baaf8a-cos-stable-63-10032-71-0/kubelet.log)):
`eviction_manager.go:256] eviction manager attempting to integrate with kernel memcg notification api`
`threshold_notifier_linux.go:70] eviction: setting notification threshold to 4828488Ki`
`eviction_manager.go:272] eviction manager: failed to create hard memory threshold notifier: invalid argument`

**Special notes for your reviewer**:
This needs to be cherrypicked back to 1.10.
This regression was added in https://github.com/kubernetes/kubernetes/pull/60531, because the `quantity` being used was changed from a DecimalSI to BinarySI, which changes how it is printed out in the String() method.  To make it more explicit that we want the value, just convert Value() to a string.

**Release note**:
```release-note
Fix memory cgroup notifications, and reduce associated log spam.
```
2018-05-16 15:25:06 -07:00
weipeng1213
a09a11fcb6 fix typo: writeable->writable 2018-05-16 12:37:17 +08:00
Kubernetes Submit Queue
6934c4f599
Merge pull request #63521 from dashpole/allocatable_memcg
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add memcg notifications for allocatable cgroup

**What this PR does / why we need it**:
Use memory cgroup notifications to trigger the eviction manager when the allocatable eviction threshold is crossed.  This allows the eviction manager to respond more quickly when the allocatable cgroup's available memory becomes low.  Evictions are preferable to OOMs in the cgroup since the kubelet can enforce its priorities on which pod is killed.

**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57901

**Special notes for your reviewer**:
This adds the alloctable cgroup from the container manager to the eviction config.

**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind feature

I would like this to be included in the 1.11 release.
2018-05-15 19:55:15 -07:00
Kubernetes Submit Queue
d42df4561a
Merge pull request #61976 from atlassian/ticker-with-stop
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Stop() for Ticker to enable leak-free code

**What this PR does / why we need it**:
I wanted to use the clock package but the `Ticker` without a `Stop()` method is a deal breaker for me.

**Release note**:
```release-note
NONE
```
/kind enhancement
/sig api-machinery
2018-05-09 19:06:56 -07:00
Kubernetes Submit Queue
ba3176d94c
Merge pull request #58580 from k82cn/k8s_58505
Automatic merge from submit-queue (batch tested with PRs 58580, 63120). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Admit BestEffort if it tolerates memory pressure.

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #58505 

**Release note**:
```release-note
None
```
2018-05-08 21:45:10 -07:00
David Ashpole
a5df208866 eviction test ensures failed pods are evicted 2018-05-08 16:08:35 -07:00
David Ashpole
2294f09e4e add memcg notifications for allocatable cgroup 2018-05-07 17:15:23 -07:00
David Ashpole
db99c20a9a cleanup eviction events 2018-05-04 11:02:25 -07:00
David Ashpole
b294173f5d fix formatting for memcg threshold 2018-04-19 14:48:41 -07:00
Mikhail Mazurskiy
1f393cdef9
Stop() for Ticker to enable leak-free code 2018-03-31 19:41:43 +11:00
Da K. Ma
b367177f3f Admit BestEffort if it tolerates memory pressure.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-03-08 09:25:06 +08:00
David Ashpole
39d9fa60e8 refresh eviction interval periodically 2018-03-06 15:14:05 -08:00
David Ashpole
54cf14ffcc subtract inactive_file from usage when setting memcg threshold 2018-03-06 09:09:44 -08:00
David Ashpole
a55119820e fix running with no eviction thresholds 2018-02-20 13:49:14 -08:00
Kubernetes Submit Queue
96ec318718
Merge pull request #59842 from ixdy/update-rules_go-02-2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 Update bazelbuild/rules_go, kubernetes/repo-infra, and gazelle dependencies

**What this PR does / why we need it**: updates our bazelbuild/rules_go dependency in order to bump everything to go1.9.4. I'm separating this effort into two separate PRs, since updating rules_go requires a large cleanup, removing an attribute from most build rules.

**Release note**:

```release-note
NONE
```
2018-02-19 22:23:05 -08:00
David Ashpole
960856f4e8 collect metrics on the /kubepods cgroup on-demand 2018-02-17 12:32:40 -08:00
Kubernetes Submit Queue
270ed995f4
Merge pull request #59841 from dashpole/metrics_after_reclaim
Automatic merge from submit-queue (batch tested with PRs 59683, 59964, 59841, 59936, 59686). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Reevaluate eviction thresholds after reclaim functions

**What this PR does / why we need it**:
When the node comes under `DiskPressure` due to inodes or disk space, the eviction manager runs garbage collection functions to clean up dead containers and unused images.
Currently, we use the strategy of trying to measure the disk space and inodes freed by garbage collection.  However, as #46789 and #56573 point out, there are gaps in the implementation that can cause extra evictions even when they are not required.  Furthermore, for nodes which frequently cycle through images, it results in a large number of evictions, as running out of inodes always causes an eviction.

This PR changes this strategy to call the garbage collection functions and ignore the results.  Then, it triggers another collection of node-level metrics, and sees if the node is still under DiskPressure.
This way, we can simply observe the decrease in disk or inode usage, rather than trying to measure how much is freed.

**Which issue(s) this PR fixes**:
Fixes #46789
Fixes #56573
Related PR #56575

**Special notes for your reviewer**:
This will look cleaner after #57802  removes arguments from [makeSignalObservations](https://github.com/kubernetes/kubernetes/pull/57802/files#diff-9e5246d8c78d50ce4ba440f98663f3e9R719).

**Release note**:
```release-note
NONE
```

/sig node
/kind bug
/priority important-soon
cc @kubernetes/sig-node-pr-reviews
2018-02-16 16:31:33 -08:00