Commit Graph

140 Commits

Author SHA1 Message Date
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Mike Danese
62c3ec969d Fix a race in the scheduler.
Loop over priorityConfigs seperately. The node loop can only safely
modify result[i][index]. Before this change it sometimes modified
result[i] concurrently with other loops.

Fixes: 7164967662

==================== Test output for //pkg/scheduler/core:go_default_test:
==================
WARNING: DATA RACE
Read at 0x00c0005e8ed0 by goroutine 22:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:667 +0x2ea
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Previous write at 0x00c0005e8ed0 by goroutine 21:
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
      pkg/scheduler/core/generic_scheduler.go:668 +0x450
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e

Goroutine 22 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162

Goroutine 21 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
      staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
  k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
      pkg/scheduler/core/generic_scheduler.go:682 +0x592
  k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
      pkg/scheduler/core/generic_scheduler.go:186 +0x77d
  k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
      pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
  testing.tRunner()
      GOROOT/src/testing/testing.go:827 +0x162
==================
--- FAIL: TestGenericScheduler (0.01s)
    --- FAIL: TestGenericScheduler/test_6 (0.00s)
        testing.go:771: race detected during execution of test
    testing.go:771: race detected during execution of test
FAIL
2018-11-09 15:21:22 -08:00
Jun Gong
9fc369dd0d Add debug info: scheduler extenders's score and its name for each pod 2018-11-08 13:02:57 +08:00
k8s-ci-robot
7984a2bf60
Merge pull request #70564 from KevinWang15/master
Fix typos
2018-11-05 19:04:45 -08:00
k8s-ci-robot
c0daab0e03
Merge pull request #70274 from zhangmingld/combinesimilercode
combine similar code where calucate schedule priority
2018-11-05 08:14:05 -08:00
Ke Wang
946c701b05 Fix Typo: mataData -> metaData; masquared -> masquerade 2018-11-05 21:19:25 +08:00
zhangmingld
7164967662 combine similar code where calucate schedule priority 2018-10-31 08:59:53 +08:00
zhangmingld
429e67a12f duplicated glog.V(10) when had a if glog.V(10) 2018-10-29 11:30:16 +08:00
k8s-ci-robot
c00f19bd15
Merge pull request #68403 from wgliang/master.deprecate-Parallelize
Replace Parallelize with function ParallelizeUntil and formally depre…
2018-10-06 09:40:07 -07:00
Guoliang Wang
187e2e01c9 Move scheduler cache interface and implementation to pkg/scheduler/internal/cache 2018-10-06 20:48:59 +08:00
Christoph Blecker
97b2992dc1
Update gofmt for go1.11 2018-10-05 12:59:38 -07:00
Guoliang Wang
c2622dd9d8 Replace Parallelize with function ParallelizeUntil and formally deprecate the Parallelize 2018-10-05 17:56:56 +08:00
Wei Huang
9da576f03c
move SchedulingQueue to pkg/scheduler/internal/queue 2018-09-28 11:51:02 -07:00
Wei Huang
2e7461c087
auto-generated files 2018-09-28 11:51:01 -07:00
k8s-ci-robot
db1d1c8674
Merge pull request #68700 from Huang-Wei/schedulingQ-graceful-shutdown
shutdown schedulingQueue gracefully
2018-09-28 00:46:14 -07:00
Wei Huang
be661fddb4
shutdown schedulingQueue gracefully
- add Close() to interface SchedulingQueue
- implement Close() for FIFO and PriorityQueue
- add unit test
2018-09-27 14:32:58 -07:00
k8s-ci-robot
a6bc5aa49e
Merge pull request #68563 from DylanBLE/dev
fix scheduler crash when Prioritize Map function failed
2018-09-26 22:59:04 -07:00
Bobby (Babak) Salamat
f340f8baf8 Remove PDB and its event handlers from the scheduler cache 2018-09-26 14:22:21 -07:00
hongjian.sun
f33c2c11f2 fix scheduler crash when Prioritize Map function failed 2018-09-26 20:16:05 +08:00
k8s-ci-robot
28d86ac47d
Merge pull request #67308 from cofyc/fix67260
Use monotonically increasing generation to prevent equivalence cache race
2018-09-25 00:18:00 -07:00
Yecheng Fu
2f46bc8a18 Use seqeuence number to represent generation of equivalence cache.
- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
2018-09-22 12:08:21 +08:00
Yecheng Fu
a2cc1b1a20 Revert "Use sync.map to scale ecache better"
This reverts commit 17d0190706.
2018-09-22 11:33:06 +08:00
Kubernetes Submit Queue
ca43f007a3
Merge pull request #67731 from gnufied/fix-csi-attach-limit
Automatic merge from submit-queue (batch tested with PRs 68161, 68023, 67909, 67955, 67731). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Fix csi attach limit

Add support for volume limits for CSI.

xref: https://github.com/kubernetes/community/pull/2051

```release-note
Add support for volume attach limits for CSI volumes
```
2018-09-05 14:51:55 -07:00
Kubernetes Submit Queue
a0b457d0e5
Merge pull request #67555 from wgliang/opt/improve-performance
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Not split nodes when searching for nodes but doing it all at once

**What this PR does / why we need it**:
Not split nodes when searching for nodes but doing it all at once.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:
@bsalamat 

This is a follow up PR of #66733.

https://github.com/kubernetes/kubernetes/pull/66733#discussion_r205932531

**Release note**:

```release-note
Not split nodes when searching for nodes but doing it all at once.
```
2018-09-04 11:41:34 -07:00
Guoliang Wang
6c63dcfffe Not split nodes when searching for nodes but doing it all at once 2018-09-04 14:07:24 +08:00
Kubernetes Submit Queue
66fa85c837
Merge pull request #67760 from houjun41544/20180823
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-08-26 19:39:06 -07:00
houjun
08e5f4573a Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core 2018-08-23 18:54:23 +08:00
Hemant Kumar
8e4b33d1a8 Move volume limit feature to beta 2018-08-22 19:36:01 -04:00
Hemant Kumar
4b17a48def Implement support for updating volume limits
Create a new predicate to count CSI volumes
2018-08-22 19:36:00 -04:00
Kubernetes Submit Queue
4cca6a89a0
Merge pull request #66862 from resouer/sync-map
Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use sync.map to scale equiv class cache better

**What this PR does / why we need it**:

Change the current lock in first level ecache into  `sync.Map`, which is known for scaling better than `sync. Mutex ` on machines with >8 CPUs

ref: https://golang.org/pkg/sync/#Map
 
And the code is much cleaner in this way.

5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:

```bash
// before
BenchmarkScheduling/5000Nodes/0Pods-64             10000          17550089 ns/op

// after
BenchmarkScheduling/5000Nodes/0Pods-64             10000          16975098 ns/op
```
Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.

**Special notes for your reviewer**:

**Release note**:

```release-note
Use sync.map to scale ecache better
```
2018-08-21 00:24:01 -07:00
Bobby (Babak) Salamat
abb70aee98 Add a scheduler config argument to set the percentage of nodes to score 2018-08-17 11:18:51 -07:00
Jonathan Basseri
b874d2789b Add metrics to equivalence cache.
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.

This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
2018-08-15 15:51:13 -07:00
Kubernetes Submit Queue
d7634dcf23
Merge pull request #66856 from charrywanganthony/scheduler_space
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add space for output

**Release note**:
```release-note
NONE
```
2018-08-14 17:55:11 -07:00
Kubernetes Submit Queue
6274590518
Merge pull request #66656 from wackxu/fixappversion
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 use apps/v1 version for scheduler

/kind cleanup

**Release note**:

```release-note
NONE
```
2018-08-11 23:25:33 -07:00
Harry Zhang
17d0190706 Use sync.map to scale ecache better 2018-08-07 14:06:09 +08:00
Chao Wang
895b6d441d add space for output 2018-08-01 18:08:31 +08:00
foxyriver
3b4f250c4a fix error log 2018-07-26 19:48:48 +08:00
wackxu
ab35fa0414 update bazel 2018-07-26 17:37:29 +08:00
xushiwei 00425595
fed8572745 use apps/v1 version for scheduler 2018-07-26 17:37:29 +08:00
Kubernetes Submit Queue
4dbcf32b3c
Merge pull request #66471 from islinwb/improve_TestZeroRequest
Automatic merge from submit-queue (batch tested with PRs 66291, 66471, 66499). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve unit test TestZeroRequest

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #66468

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-07-24 13:59:58 -07:00
Kubernetes Submit Queue
2119d349b0
Merge pull request #66291 from resouer/fix-extender
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Extender preemption should respect IsInterested()

**What this PR does / why we need it**:

Extender preemption should respect IsInterested()

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #66289 

**Special notes for your reviewer**:

The bug is reported and the first commit is co-authored by: @chenchun

**Release note**:

```release-note
Extender preemption should respect IsInterested()
```
2018-07-24 13:48:38 -07:00
Weibin Lin
972e78748a add pod UID 2018-07-23 10:44:31 +08:00
Harry Zhang
d644162a29 Extender preemption should respect IsInterested()
Co-authored-by: Harry Zhang <resouer@gmail.com>
Co-authored-by: Chun Chen <ramichen@tencent.com>
2018-07-23 10:13:38 +08:00
Weibin Lin
5449d153bb Improve unit test TestZeroRequest 2018-07-23 09:15:19 +08:00
John Calabrese
ad234e58be use subtest for table units
remove duplicate testname from error msg

remove subtest for test setup loop

do not break on test failure

  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203571355

remove duplicate test.name in output

  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203574001
  https://github.com/kubernetes/kubernetes/pull/63665#discussion_r203574012
2018-07-20 16:02:50 -04:00
Harry Zhang
e5a7a4caf7 Fist level ecache for nodeMap
Use new cache map in scheduler

Add a integration test

Move init before schedudling

Add lock for first level cache
2018-07-18 15:11:59 +08:00
tanshanshan
06fb64cdf8 fix glogformat 2018-07-14 10:22:12 +08:00
Kubernetes Submit Queue
f0311d8232
Merge pull request #65396 from bsalamat/sched_no_sort
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Improve scheduler's performance by eliminating sorting of nodes by their score

**What this PR does / why we need it**:
Profiling scheduler, I noticed that scheduler spends a significant amount of time in sorting the nodes after we score them to find nodes with the highest score. Finding nodes with the highest score does not need sorting the array. This PR replaces the sort with a linear scan.

Eliminating the sort results in over 10% improvement in throughput of the scheduler.

Before (3 runs for 5000 nodes, scheduling 1000 pods in a cluster running 2000 pods):
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20682552 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  20464729 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  21188906 ns/op

After:
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18485866 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18457749 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12         	    1000	  18418200 ns/op

**Release note**:

```release-note
Improve scheduler's performance by eliminating sorting of nodes by their score.
```
2018-06-23 20:12:01 -07:00
Bobby (Babak) Salamat
ffc8cc2f50 Improve scheduler's performance by eliminating sorting when finding the host with the highest score 2018-06-23 11:24:43 -07:00
Kubernetes Submit Queue
582b88c879
Merge pull request #64995 from bsalamat/preempt_opt
Automatic merge from submit-queue (batch tested with PRs 65388, 64995). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add more conditions to the list of predicate failures that won't be resolved by preemption

**What this PR does / why we need it**:
Adds more conditions to the list of predicate failures that won't be resolved by preemption. This change can potentially improve performance of preemption by avoiding the nodes that won't be able to schedule the pending pod no matter how many other pods are removed from them.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Add more conditions to the list of predicate failures that won't be resolved by preemption.
```

/sig scheduling
2018-06-23 05:52:07 -07:00