Commit Graph

1141 Commits

Author SHA1 Message Date
Jeff Grafton
efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
Jonathan Basseri
732e785e0a Performance improvement for affinity term matching.
When a PodAffinityTerm uses TopologyKey=kubernetes.io/hostname, we can
avoid searching the entire cluster for a match by only listing pods on
the given node.
2017-12-21 16:01:22 -08:00
Kubernetes Submit Queue
d7e5bd194a
Merge pull request #57477 from misterikkit/noStrCat
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Avoid string concatenation when comparing pods.

**What this PR does / why we need it**:

Pod comparison in (*NodeInfo).Filter was using GetPodFullName before
comparing pod names. This is a concatenation of pod name and pod
namespace, and it is significantly faster to compare name & namespace
instead.

This is a set of 3 PRs targeting affinity predicate performance. (#57476, #57477, #57478) The key takeaway is approximately 2x speedup in the large affinity benchmark.

The unexpected increase in BenchmarkScheduling/1000Nodes/1000Pods seems to be an outlier, and did not recur on subsequent runs. The benchmarks have a moderate amount of variance to them, and I did not run them enough times to measure mean and standard deviation.

| test | b.N | master | #57476 | #57477 | #57478 | combined |
| ---- | --- | ------ | ------ | ---------- | ---------- | -------- |
| BenchmarkScheduling/100Nodes/0Pods                | 100 |  39629010 ns/op | 36898566 ns/op (-6.89%)   |  38461530 ns/op (-2.95%)  |  36214136 ns/op (-8.62%)  |  43090781 ns/op (+8.74%)  |
| BenchmarkScheduling/100Nodes/1000Pods             | 100 |  85489577 ns/op | 69538016 ns/op (-18.66%)  |  70104254 ns/op (-18.00%) |  75015585 ns/op (-12.25%) |  80986960 ns/op (-5.27%)  |
| BenchmarkScheduling/1000Nodes/0Pods               | 100 | 219356660 ns/op | 200149051 ns/op (-8.76%)  | 192867469 ns/op (-12.08%) | 196896770 ns/op (-10.24%) | 212563662 ns/op (-3.10%)  |
| BenchmarkScheduling/1000Nodes/1000Pods            | 100 | 380368238 ns/op | 381786369 ns/op (+0.37%)  | 387224973 ns/op (+1.80%)  | 417974358 ns/op (+9.89%)  | 411140230 ns/op (+8.09%)  |
| BenchmarkSchedulingAntiAffinity/500Nodes/250Pods  | 250 | 124399176 ns/op | 97568988 ns/op (-21.57%)  | 112027363 ns/op (-9.95%)  | 129134326 ns/op (+3.81%)  |  98607941 ns/op (-20.73%) |
| BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods | 250 | 491677096 ns/op | 441562422 ns/op (-10.19%) | 278127757 ns/op (-43.43%) | 447355609 ns/op (-9.01%)  | 226310721 ns/op (-53.97%) |

Combined performance contains all three patches.
Percentages are relative to master.

Methodology:

I ran the tests on each branch with this command.
```
make test-integration WHAT="./test/integration/scheduler_perf" KUBE_TEST_ARGS="-run=xxxx -bench=."
```

The benchmarks have a fair amount of variance to them, and I did not run them enough times to measure mean and standard deviation.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

The three PRs in this set should collectively fix #54189.

**Special notes for your reviewer**:

**Release note**:

```release-note
Improve scheduler performance of MatchInterPodAffinity predicate.
```
2017-12-21 15:18:55 -08:00
Jonathan Basseri
3909dc1341 Avoid array growth in FilteredList.
The method (*schedulerCache).FilteredList builds an array of *v1.Pod
that contains every pod in the cluster except for those filtered out by
a predicate. Today, it starts with a nil slice and appends to it.

Based on current usage, FilteredList is expected to return every pod in
the cluster or omit some pods from a single node. This change reserves
array capacity equal to the total number of pods in the cluster.
2017-12-21 10:50:04 -08:00
Jonathan Basseri
7b3638ea77 Avoid string concatenation when comparing pods.
Pod comparison in (*NodeInfo).Filter was using GetPodFullName before
comparing pod names. This is a concatenation of pod name and pod
namespace, and it is significantly faster to compare name & namespace
instead.
2017-12-21 09:31:53 -08:00
Kubernetes Submit Queue
754bb1350f
Merge pull request #55442 from anfernee/priority_resource
Automatic merge from submit-queue (batch tested with PRs 57257, 55442). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Merge 3 resource allocation priority functions

**What this PR does / why we need it**: those 3 priority functions are closed related, and share a lot of the same logic, put them together.

**Release note**:
```release-note
None
```
2017-12-20 23:56:49 -08:00
Yongkun Anfernee Gui
c65225ee19 Merge 3 resource allocation priority functions 2017-12-20 17:21:22 -08:00
Yassine TIJANI
e62952d02b using consts to refer to predicate names 2017-12-20 13:21:20 +00:00
Yassine TIJANI
ecba504974 implementing predicates ordering 2017-12-18 17:44:24 +00:00
Kubernetes Submit Queue
665e8b2d65
Merge pull request #56375 from CaoShuFeng/glogV10
Automatic merge from submit-queue (batch tested with PRs 56375, 56872, 57053, 57165, 57218). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove extra level check of glog

**Release note**:
```release-note
NONE
```
2017-12-17 05:33:38 -08:00
Kubernetes Submit Queue
203078538a
Merge pull request #56792 from denverdino/fix-typo-in-algorithmprovider-defaults
Automatic merge from submit-queue (batch tested with PRs 56250, 56809, 56812, 56792, 56724). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix typo

Signed-off-by: Li Yi <denverdino@gmail.com>

**What this PR does / why we need it**:

Fix the typo in /plugin/pkg/scheduler/algorithmprovider/defaults.go
2017-12-16 07:46:46 -08:00
Kubernetes Submit Queue
a99fdfc680
Merge pull request #56480 from CaoShuFeng/schedule_queue
Automatic merge from submit-queue (batch tested with PRs 56480, 56675, 56624, 56648, 56658). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix scheduling queue unit test

This change makes sure the Pop() test finish completely.

**Release note**:
```release-note
NONE
```
2017-12-16 03:24:40 -08:00
Kubernetes Submit Queue
f5fa99cc82
Merge pull request #56549 from CaoShuFeng/thread_safe
Automatic merge from submit-queue (batch tested with PRs 56579, 55236, 56512, 56549, 56538). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Heap is not thread safe in scheduling queue

/cc @bsalamat 

**Release note**:
```release-note
NONE
```
2017-12-15 21:19:42 -08:00
Kubernetes Submit Queue
40ad5d02f8
Merge pull request #56322 from guangxuli/priority_map_performance
Automatic merge from submit-queue (batch tested with PRs 56413, 56322, 56490, 56460, 56487). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Put process of getting pod controller reference into metadata

**What this PR does / why we need it**:
We should extract our common process/data into metadata just as other map priority functions do, so we could avoid getting same required data repeatedly in every node map process.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
None

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
2017-12-15 16:43:50 -08:00
Kubernetes Submit Queue
68c857e207
Merge pull request #55957 from jsafrane/protection-predicate
Automatic merge from submit-queue (batch tested with PRs 57211, 56150, 56368, 56271, 55957). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Skip pods that refer to PVCs that are being deleted

**What this PR does / why we need it**:

New check was added to `Schedule()` to make sure that a scheduled pod refers to existing PVCs that are not being deleted.

In 1.9 we plan to add a new feature that uses finalizers on PVC to protect PVCs that are used by a running pod from being deleted. This finalizer will be removed when all pods that use a PVC are finished or deleted. See https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md for details.

I needed to pass `pvcLister` to `GenericScheduler`.

UX:

```
$ kubectl describe pod
...
  Type     Reason            Age              From               Message
  ----     ------            ----             ----               -------
  Warning  FailedScheduling  5s (x4 over 8s)  default-scheduler  persistentvolumeclaim "myclaim" is being deleted
  Warning  FailedScheduling  1s (x2 over 1s)  default-scheduler  persistentvolumeclaim "myclaim" not found

```


**Release note**:

```release-note
Scheduler skips pods that use a PVC that either does not exist or is being deleted.
```

/sig scheduling
/kind feature
2017-12-15 14:00:49 -08:00
Kubernetes Submit Queue
588c1e970a
Merge pull request #56271 from tanshanshan/fix-little-scheduler
Automatic merge from submit-queue (batch tested with PRs 57211, 56150, 56368, 56271, 55957). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Put variable declared in the front.

**What this PR does / why we need it**:

put variable declared in the front.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2017-12-15 14:00:47 -08:00
Kubernetes Submit Queue
e2e5f2339b
Merge pull request #55853 from guangxuli/fix_scheduler_test
Automatic merge from submit-queue (batch tested with PRs 56308, 54304, 56364, 56388, 55853). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

httptest server should be close since Close issue has been fixed

**What this PR does / why we need it**:
per https://github.com/kubernetes/kubernetes/issues/19254, the issue seem to be fix for a long time and `server.Close` is no longer a issue in current related golang version, so it's time to uncomment the server.Close(). 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
None
**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
2017-12-15 02:04:45 -08:00
Kubernetes Submit Queue
59bf6fed73
Merge pull request #56388 from CaoShuFeng/failureDomain
Automatic merge from submit-queue (batch tested with PRs 56308, 54304, 56364, 56388, 55853). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up failure domain from InterPodAffinityPriority

**Release note**:
```release-note
NONE
```
2017-12-15 02:04:42 -08:00
Kubernetes Submit Queue
5e478f072c
Merge pull request #56184 from CaoShuFeng/statefulset
Automatic merge from submit-queue (batch tested with PRs 54410, 56184, 56199, 56191, 56231). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove useless const

Trivial fix.

**Release note**:

```release-note
NONE
```
2017-12-14 05:33:11 -08:00
Kubernetes Submit Queue
7335c41ebe
Merge pull request #56622 from wackxu/nodemiss
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

delete a node from its cache if it gets node not found error

**What this PR does / why we need it**:

delete a node from its cache if it gets node not found error

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes # https://github.com/kubernetes/kubernetes/issues/56261

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-12-12 11:22:12 -08:00
Kubernetes Submit Queue
305d644363
Merge pull request #56577 from resouer/fix-eclass-pvc
Automatic merge from submit-queue (batch tested with PRs 56688, 56577). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add pvc as part of equivalence hash

**What this PR does / why we need it**:

Should add PVC as part of equivalence hash so that `StatefulSe`t and `Operator` will always run the volume predicate, while the `ReplicaSet` can still  re-use cached ones.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56265

**Special notes for your reviewer**:

**Release note**:

```release-note
Add pvc as part of equivalence hash
```
2017-12-05 14:31:09 -08:00
Li Yi
44877d87cb Fix typo
Change-Id: Ie8a4e9cf510fe2f7e7445af03476a0e7759a0360
Signed-off-by: Li Yi <denverdino@gmail.com>
2017-12-04 21:16:31 +08:00
Kubernetes Submit Queue
2b98a976fb
Merge pull request #53647 from wenlxie/githubupstream.master.fixinterpodantiaffinity
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix inter-pod anti-affinity issue

This is used to fix:
https://github.com/kubernetes/kubernetes/issues/50813
2017-12-03 07:13:08 -08:00
Harry Zhang
b3bb74e3a3 Update generated bazel 2017-12-02 22:24:17 +08:00
Harry Zhang
e4055c0df2 Add pvc as part of equivalence hash
Use factory to generat get equivalence pod func
2017-12-02 22:24:17 +08:00
Harry Zhang
af243f4824 Fix PV counter predicate in eclass 2017-12-02 22:24:17 +08:00
wackxu
aac60b6cbb delete a node from its cache if it gets node not found error 2017-12-02 09:34:25 +08:00
wenlxie
82e02cc986 fix inter-pod anti-affinity issue 2017-12-01 19:32:21 +08:00
Cao Shufeng
184eb83162 remove extra level check of glog 2017-11-30 15:58:18 +08:00
Cao Shufeng
3ef8ab4d70 Heap is not thread safe in scheduling queue 2017-11-30 14:04:28 +08:00
Michelle Au
c26debecef Return no volume match if prebound PV node affinity doesn't match node 2017-11-29 17:29:58 -08:00
Cao Shufeng
33f6625a84 fix scheduling queue unit test
This change makes sure the Pop() test finish completely.
2017-11-28 17:40:35 +08:00
Avesh Agarwal
b571001999 Implement resource limit priority function. This function checks if the input pod's
resource limits are satisfied by the input node's allocatable resources or not.
If yes, the node is assigned a score of 1, otherwise the node's score is not changed.
2017-11-27 12:53:47 -05:00
Cao Shufeng
888580e032 clean up failure domain from InterPodAffinityPriority 2017-11-27 13:13:12 +08:00
Gavin
58ed69a9c8 put pod controllerref to metadata 2017-11-24 15:04:19 +08:00
Jan Safranek
0a96a75cea Remove PVCLister and use informer directly. 2017-11-23 10:04:42 +01:00
Jan Safranek
19caa9c50d Skip pods that refer to PVCs that are being deleted
Scheduler should ignore pods that refer to PVCs that either do not exist or
are being deleted.
2017-11-23 10:01:23 +01:00
Kubernetes Submit Queue
82c88982c0
Merge pull request #56178 from bsalamat/pdb
Automatic merge from submit-queue (batch tested with PRs 55952, 49112, 55450, 56178, 56151). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add PodDisruptionBudget support in pod preemption

**What this PR does / why we need it**:
This PR adds the logic to make scheduler preemption aware of PodDisruptionBudget. Preemption tries to avoid preempting pods whose PDBs are violated by preemption. If preemption does not find any other pods to preempt, it will preempt pods despite violating their PDBs.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #53913

**Special notes for your reviewer**:

**Release note**:

```release-note
Add PodDisruptionBudget support during pod preemption
```

ref/ #47604

/sig scheduling
2017-11-22 21:48:48 -08:00
tanshanshan
9727cd0636 declare in front 2017-11-23 11:50:04 +08:00
Kubernetes Submit Queue
4904037645
Merge pull request #55569 from x1957/fixtypo
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fixtypo

**What this PR does / why we need it**:
fixtypo
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
2017-11-22 17:21:11 -08:00
Kubernetes Submit Queue
6a889ec37f
Merge pull request #55039 from msau42/local-binding-4
Automatic merge from submit-queue (batch tested with PRs 51321, 55969, 55039, 56183, 55976). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Topology aware volume scheduler and PV controller changes

**What this PR does / why we need it**:
Scheduler and PV controller changes to support volume topology aware scheduling, as specified in kubernetes/community#1168

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #54435

**Special notes for your reviewer**:
* I've split the PR into logical commits to make it easier to review
* The remaining TODOs I plan to address next release unless you think it needs to be done now

**Release note**:
```release-note
Adds alpha support for volume scheduling, which allows the scheduler to make PersistentVolume binding decisions while respecting the Pod's scheduling requirements.  Dynamic provisioning is not supported with this feature yet.

Action required for existing users of the LocalPersistentVolumes alpha feature:
* The VolumeScheduling feature gate also has to be enabled on kube-scheduler and kube-controller-manager.
* The NoVolumeNodeConflict predicate has been removed.  For non-default schedulers, update your scheduler policy.
* The CheckVolumeBinding predicate has to be enabled in non-default schedulers.
```

@kubernetes/sig-storage-pr-reviews @kubernetes/sig-scheduling-pr-reviews
2017-11-22 11:59:55 -08:00
Bobby (Babak) Salamat
a0ef9cd09a Autogenerated files 2017-11-22 09:46:26 -08:00
Bobby (Babak) Salamat
3d4ae31d91 Add PDB support during pod preemption 2017-11-22 09:46:26 -08:00
Kubernetes Submit Queue
2a18a2aadf
Merge pull request #55103 from ConnorDoyle/remove-oir
Automatic merge from submit-queue (batch tested with PRs 55103, 56036, 56186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Removed opaque integer resources (deprecated in v1.8)

**What this PR does / why we need it**:

* Remove opaque integer resources (OIR) support from the code base. This feature was deprecated in v1.8 and replaced by Extended Resources (ER).

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #55102

**Release note**:

```release-note
Remove opaque integer resources (OIR) support (deprecated in v1.8.)
```
2017-11-22 00:27:27 -08:00
Michelle Au
a84e5b9613 update build files 2017-11-21 23:19:44 -08:00
Michelle Au
5871b501ac Add assume/bind volume functions to scheduler 2017-11-21 23:19:44 -08:00
Michelle Au
094841c62e Add predicate to find volume matches 2017-11-21 23:19:44 -08:00
Michelle Au
01a8772111 Scheduler volume cache plumbing and predicate invalidation 2017-11-21 23:19:43 -08:00
Cao Shufeng
f3c4ef835b remove useless const 2017-11-22 11:41:31 +08:00
Kubernetes Submit Queue
5d3aae0192
Merge pull request #56083 from yguo0905/sched-log
Automatic merge from submit-queue (batch tested with PRs 56128, 56004, 56083, 55833, 56042). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Suppress the warning when a pod in binding cannot be expired

**What this PR does / why we need it**:

I have a scheduler extender, which implements the `Bind` call and takes several minutes to respond to that call. The scheduler log was full of the following error.

```
W1120 10:23:09.691188   99720 cache.go:442] Couldn't expire cache for pod default/xxx. Binding is still in progress.
```

The TTL for a pod to be expired in the scheduler cache is 30 seconds. But it's also possible that the binding (which is done asynchronously) can take longer than 30 seconds.
2cbb07a439/plugin/pkg/scheduler/factory/factory.go (L143)

The go routine that checks whether a pod has been expired is triggered every second.
2cbb07a439/plugin/pkg/scheduler/schedulercache/cache.go (L33)

So, it will print the the following warning every seconds until the pod gets expired.
2cbb07a439/plugin/pkg/scheduler/schedulercache/cache.go (L442-L443)

I think it's a valid for the binding to take more than one second, so we should downgrade this to an info to avoid polluting the scheduler log.

**Release note**:
```release-note
None
```

/sig scheduling
/assign @bsalamat 
/cc @vishh
2017-11-21 17:04:56 -08:00