- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Not split nodes when searching for nodes but doing it all at once
**What this PR does / why we need it**:
Not split nodes when searching for nodes but doing it all at once.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
@bsalamat
This is a follow up PR of #66733.
https://github.com/kubernetes/kubernetes/pull/66733#discussion_r205932531
**Release note**:
```release-note
Not split nodes when searching for nodes but doing it all at once.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use sync.map to scale equiv class cache better
**What this PR does / why we need it**:
Change the current lock in first level ecache into `sync.Map`, which is known for scaling better than `sync. Mutex ` on machines with >8 CPUs
ref: https://golang.org/pkg/sync/#Map
And the code is much cleaner in this way.
5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:
```bash
// before
BenchmarkScheduling/5000Nodes/0Pods-64 10000 17550089 ns/op
// after
BenchmarkScheduling/5000Nodes/0Pods-64 10000 16975098 ns/op
```
Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.
**Special notes for your reviewer**:
**Release note**:
```release-note
Use sync.map to scale ecache better
```
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.
This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add space for output
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66291, 66471, 66499). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve unit test TestZeroRequest
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66468
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extender preemption should respect IsInterested()
**What this PR does / why we need it**:
Extender preemption should respect IsInterested()
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66289
**Special notes for your reviewer**:
The bug is reported and the first commit is co-authored by: @chenchun
**Release note**:
```release-note
Extender preemption should respect IsInterested()
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve scheduler's performance by eliminating sorting of nodes by their score
**What this PR does / why we need it**:
Profiling scheduler, I noticed that scheduler spends a significant amount of time in sorting the nodes after we score them to find nodes with the highest score. Finding nodes with the highest score does not need sorting the array. This PR replaces the sort with a linear scan.
Eliminating the sort results in over 10% improvement in throughput of the scheduler.
Before (3 runs for 5000 nodes, scheduling 1000 pods in a cluster running 2000 pods):
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 20682552 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 20464729 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 21188906 ns/op
After:
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18485866 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18457749 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18418200 ns/op
**Release note**:
```release-note
Improve scheduler's performance by eliminating sorting of nodes by their score.
```
Automatic merge from submit-queue (batch tested with PRs 65388, 64995). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add more conditions to the list of predicate failures that won't be resolved by preemption
**What this PR does / why we need it**:
Adds more conditions to the list of predicate failures that won't be resolved by preemption. This change can potentially improve performance of preemption by avoiding the nodes that won't be able to schedule the pending pod no matter how many other pods are removed from them.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add more conditions to the list of predicate failures that won't be resolved by preemption.
```
/sig scheduling
This moves the equivalence cache implementation out of the 'core'
package and into k8s.io/kubernetes/pkg/scheduler/core/equivalence.
Separating the equiv. cache from the genericScheduler implementation
make their interaction points easier to follow, and prevents us from
accidentally accessing unexported fields.
Automatic merge from submit-queue (batch tested with PRs 64142, 64426, 62910, 63942, 64548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: further cleanup of equivalence cache
**What this PR does / why we need it**:
This improves comments and simplifies some names/logic in equivalence_cache.go, as well as changing the order of some items in the file.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 64252, 64307, 64163, 64378, 64179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`
**What this PR does / why we need it**:
Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This makes the lookup behave like a normal map lookup, so it is easier
for readers to follow the logic. It also inverts the "invalid" bool to
an "ok" bool because `!invalid` is a double negative.
Automatic merge from submit-queue (batch tested with PRs 63434, 64172, 63975, 64180, 63755). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Optimize the lock which in the RunPredicate
**What this PR does / why we need it**:
Enhance the performance of scheduler
- Change the lock in the RunPredicate from lock to rlock
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Could solve part of #63784
**Special notes for your reviewer**:
_Run benchmark test by scheduler_perf_:
`Before` BenchmarkScheduling/1000Nodes/0Pods-32 1000 11689758 ns/op
`After` BenchmarkScheduling/1000Nodes/0Pods-32 1000 5951510 ns/op
_Run integration (density) test by scheduler_perf_:
Schedule 3000 Pods On 3000 Nodes
`Before` rate 19 per second on average
`After` rate 58 per second on average
_Cpu profile test result_:
`Before` [click](https://cdn.rawgit.com/godliness/files/master/63784_before.svg)
`After` [click](https://cdn.rawgit.com/godliness/files/master/63784_after.svg)
**Release note**:
```release-note
`None`
```
/sig scheduling
/cc @misterikkit
/cc @bsalamat
/cc @ravisantoshgudimetla
/cc @resouer