With the alpha scheduling queue we move pods from unschedulable to
active on certain events without a backoff. As a result we can cause
starvation issues if high priority pods are in the unschedulable queue.
Implement a backoff mechanism for pods being moved to active.
Closes#56721
- don't update nominatedMap cache when Pop() an element from activeQ
- instead, delete the nominated info from cache when it's "assumed"
- unit test behavior adjusted
- expose SchedulingQueue in factory.Config
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
Loop over priorityConfigs seperately. The node loop can only safely
modify result[i][index]. Before this change it sometimes modified
result[i] concurrently with other loops.
Fixes: 7164967662
==================== Test output for //pkg/scheduler/core:go_default_test:
==================
WARNING: DATA RACE
Read at 0x00c0005e8ed0 by goroutine 22:
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
pkg/scheduler/core/generic_scheduler.go:667 +0x2ea
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e
Previous write at 0x00c0005e8ed0 by goroutine 21:
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
pkg/scheduler/core/generic_scheduler.go:668 +0x450
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e
Goroutine 22 (running) created at:
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
pkg/scheduler/core/generic_scheduler.go:682 +0x592
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
pkg/scheduler/core/generic_scheduler.go:186 +0x77d
k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
testing.tRunner()
GOROOT/src/testing/testing.go:827 +0x162
Goroutine 21 (running) created at:
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
pkg/scheduler/core/generic_scheduler.go:682 +0x592
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
pkg/scheduler/core/generic_scheduler.go:186 +0x77d
k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
testing.tRunner()
GOROOT/src/testing/testing.go:827 +0x162
==================
--- FAIL: TestGenericScheduler (0.01s)
--- FAIL: TestGenericScheduler/test_6 (0.00s)
testing.go:771: race detected during execution of test
testing.go:771: race detected during execution of test
FAIL
- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Not split nodes when searching for nodes but doing it all at once
**What this PR does / why we need it**:
Not split nodes when searching for nodes but doing it all at once.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
@bsalamat
This is a follow up PR of #66733.
https://github.com/kubernetes/kubernetes/pull/66733#discussion_r205932531
**Release note**:
```release-note
Not split nodes when searching for nodes but doing it all at once.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use sync.map to scale equiv class cache better
**What this PR does / why we need it**:
Change the current lock in first level ecache into `sync.Map`, which is known for scaling better than `sync. Mutex ` on machines with >8 CPUs
ref: https://golang.org/pkg/sync/#Map
And the code is much cleaner in this way.
5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:
```bash
// before
BenchmarkScheduling/5000Nodes/0Pods-64 10000 17550089 ns/op
// after
BenchmarkScheduling/5000Nodes/0Pods-64 10000 16975098 ns/op
```
Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.
**Special notes for your reviewer**:
**Release note**:
```release-note
Use sync.map to scale ecache better
```
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.
This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add space for output
**Release note**:
```release-note
NONE
```