- move extender related API from pkg/scheduler/api to pkg/scheduler/apis/extender/v1
- alias extenderv1 to pkg/scheduler/apis/extender/v1
- use NodeScore and NodeScoreList in non-extender logic
This PR only implements the mapping, but does not use it. A followup PR will use this mapping to produce a framework configuration that redirects mapped predicates/priorites to be exected as plugins.
This is needed to allow efficient preemption simulations: during preemption, we remove/add pods from each node before running the filter plugins again to evaluate whether removing/adding specific pods will allow the incoming pod to be scheduled on the node. Instead of calling prefilter again, we should allow the plugin to do incremental update to its pre-computed state.
This PR modifies the scheduler's configurator to allow for instantiating the framework at a later stage in the configuration. Specifically at the point where we know exactly which predicates/priorities need to be configured.
This is necessary to allow converting predicates/priorities configuration into a plugin configuration to facilitate framework migration.
The status can be used by (Pre)Filter plugins to indicate that
preemption wouldn't change the decision of the filter.
Signed-off-by: Aldo Culquicondor <acondor@google.com>
- Rename 'topologyPairsPodSpreadMap' to 'podSpreadCache'
- New struct `criticalPaths criticalPaths`
- Add unified method `*criticalPaths.update()` for:
- regular update
- addPod in preemption case
- remotePod in preemption case
Filter() is called simultaneously, so the member of its (fake) implementation
cannot be written without lock.
The issue can be triggered by:
go test k8s.io/kubernetes/pkg/scheduler/core --race --count=50
The Configurator has been used as a holder for listers that tests need,
which is not its purpose. By making the tests obtain listers from more
appropriate places, such as informers, there is no need for various
accessors to the Configurator.
Also, FakeConfigurator is not being used anymore, so there's no need for
an interface instead of a plain pointer.
Signed-off-by: Aldo Culquicondor <acondor@google.com>
Make the cache implement NodeLister and expose it to the priority
functions. This way, the priority functions make use of a single cache,
the scheduler's, instead of mixing it with the lister's caches.
Signed-off-by: Aldo Culquicondor <acondor@google.com>
Calling recordSchedulingFailure puts the pod back to scheduling queue in another
goroutine so pod may get a chance to be bond again before unreseve plugin cleaning
state about it.
update bazel build
fix get plugin config method
initialize only needed plugins
fix unit test
fix import duplicate package
update bazel
add docstrings
add weight field to plugin
add plugin to v1alpha1
add plugins at appropriate extension points
remove todo statement
fix import package file path
set plugin json schema
add plugin unit test to option
initial plugin in test integration
initialize only needed plugins
update bazel
rename func
change plugins needed logic
remove v1 alias
change the comment
fix alias shorter
remove blank line
change docstrings
fix map bool to struct
add some docstrings
add unreserve plugin
fix docstrings
move variable inside the for loop
make if else statement cleaner
remove plugin config from reserve plugin unit test
add plugin config and reduce unnecessary options for unit test
update bazel
fix race condition
fix permit plugin integration
change plugins to be pointer
change weight to int32
fix package alias
initial queue sort plugin
rename unreserve plugin
redesign plugin struct
update docstrings
check queue sort plugin amount
fix error message
fix condition
change plugin struct
add disabled plugin for unit test
fix docstrings
handle nil plugin set
In some cases, an Update event with no "NominatedNode" present is received right
after a node("NominatedNode") is reserved for this pod in memory.
If we go updating (delete and add) it, it actually un-reserves the node since
the newPod doesn't carry sped.status.nominatedNode.
In this case, during this time other low-priority pods have chances to take space which
was reserved for the nominatedPod.
This is some light cleanup of logs in predicates.go. In particular, some
log lines have details clarified that will make debugging easier.
I have not touched any VLOG messages, since those usually have plenty of
detail.
The function AddUnschedulableIfNotPresent is responsible for
initializing or updating backoff timers for pods that could not be
scheduled. The helper function backoffPod does that work, but was not
being called in all cases.
This moves that call to be (mostly) unconditional, while cleaning up
comments and error handling.
Fix error collides with imported package name for plugins.go
- Fix variable 'preds' collides with imported package name
- Fix warning in initializing variables, error string log be capitalized
- add incremental scheduling cycle
- instead of set a flag on move reqeust, we cache current scheduling
cycle in moveRequestCycle
- when unschedulable pods are added back, compare its cycle with
moveRequestCycle to decide whether it should be added into active queue
or not
There is no need to clear stale pod binding cache in scheduling, because
it will be recreated at beginning of each schedule loop, and will be
cleared when pod is removed from scheduling queue.
This moves the priority types from the algorithm package
to priorities package.
Idea is to move the type to the packages where it is
implemented. This will ease the future refactor process.
When starvation heppens:
- a lot of unschedulable pods exists in the head of queue
- because condition.LastTransitionTime is updated only when condition.Status changed
- (this means that once a pod is marked unschedulable, the field never updated until the pod successfuly scheduled.)
What was changed:
- condition.LastProbeTime is updated everytime when pod is determined
unschedulable.
- changed sort function so to use LastProbeTime to avoid starvation
described above
Consideration:
- This changes increases k8s API server load because it updates Pod.status whenever scheduler decides it as
unschedulable.
Signed-off-by: Shingo Omura <everpeace@gmail.com>
This moves the signal handling for CacheDebugger from the factory
package into the CacheDebugger's package. That makes it easier to reuse
from packages other than factory.
- Maintain list of default predicates and priorities in defaults.go
and move the registration to separate files
Signed-off-by: Bhavin Gandhi <bhavin7392@gmail.com>
This moves the type `ScheduleAlgorithm` from `pkg/scheduler/algorithm`
to `pkg/scheduler/core`. The reason for this move is to fix our import
dependency graph and allow predicate & priority types to be moved into
their appropriate packages.
The new location makes sense because `core` is the only package that
exports an implementation of this type.
With the alpha scheduling queue we move pods from unschedulable to
active on certain events without a backoff. As a result we can cause
starvation issues if high priority pods are in the unschedulable queue.
Implement a backoff mechanism for pods being moved to active.
Closes#56721
We are going to use PodBackoff for controlling backoff when adding
unschedulable pods back to the active scheduling queue. In order to do
this more easily, limit the interface for PodBackoff to only this struct
(rather than exposing BackoffEntry) and change the backing expiry
implementation to be queue based.
The Heap data structure is useful for our backoff system in addition to
scheduling queue. Move it to somewhere it can be consumed by both
systems and properly export needed names. Also adding unit tests
from client-go/tools/cache/heap.go.
This util was used to fake certain aspects of apiserver behavior, such
as resource paths and JSON encoding. Our unit tests have been refactored
so they don't rely on the REST or JSON aspects of apiserver. This util
is no longer needed.
- don't update nominatedMap cache when Pop() an element from activeQ
- instead, delete the nominated info from cache when it's "assumed"
- unit test behavior adjusted
- expose SchedulingQueue in factory.Config
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
Loop over priorityConfigs seperately. The node loop can only safely
modify result[i][index]. Before this change it sometimes modified
result[i] concurrently with other loops.
Fixes: 7164967662
==================== Test output for //pkg/scheduler/core:go_default_test:
==================
WARNING: DATA RACE
Read at 0x00c0005e8ed0 by goroutine 22:
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
pkg/scheduler/core/generic_scheduler.go:667 +0x2ea
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e
Previous write at 0x00c0005e8ed0 by goroutine 21:
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes.func2()
pkg/scheduler/core/generic_scheduler.go:668 +0x450
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:65 +0x9e
Goroutine 22 (running) created at:
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
pkg/scheduler/core/generic_scheduler.go:682 +0x592
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
pkg/scheduler/core/generic_scheduler.go:186 +0x77d
k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
testing.tRunner()
GOROOT/src/testing/testing.go:827 +0x162
Goroutine 21 (running) created at:
k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue.ParallelizeUntil()
staging/src/k8s.io/client-go/util/workqueue/parallelizer.go:57 +0x1a3
k8s.io/kubernetes/pkg/scheduler/core.PrioritizeNodes()
pkg/scheduler/core/generic_scheduler.go:682 +0x592
k8s.io/kubernetes/pkg/scheduler/core.(*genericScheduler).Schedule()
pkg/scheduler/core/generic_scheduler.go:186 +0x77d
k8s.io/kubernetes/pkg/scheduler/core.TestGenericScheduler.func1()
pkg/scheduler/core/generic_scheduler_test.go:464 +0x91f
testing.tRunner()
GOROOT/src/testing/testing.go:827 +0x162
==================
--- FAIL: TestGenericScheduler (0.01s)
--- FAIL: TestGenericScheduler/test_6 (0.00s)
testing.go:771: race detected during execution of test
testing.go:771: race detected during execution of test
FAIL
This adds a counter to the scheduler that can be used to calculate
throughput and error ratio. Pods which fail to schedule are not counted
as errors, but can still be tracked separately from successes.
We already measure scheduler latency, but throughput was missing. This
should be considered a key metric for the scheduler.
- snapshot equivalence cache generation numbers before snapshotting the
scheduler cache
- skip update when generation does not match live generation
- keep the node and increment its generation to invalidate it instead of
deletion
- use predicates order ID as key to improve performance
- update logic of verifying incoming pod's anti-affinity
- rename podMatchesAffinityTermProperties to podMatchesAllAffinityTermProperties
- add podMatchesAnyAffinityTermProperties which is used in some PodAntiAffinity cases
- rename some functions to make it more readable
- add unit tests to verify correctness of PodAffinity and PodAntiAffinity
- verifying "Existing pod anti-affinity"
- verifying "incoming pod's anti-affinity"
- verifying "incoming pod's affinity"
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Consolidate componentconfig code standards
**What this PR does / why we need it**:
This PR fixes a bunch of very small misalignments in ComponentConfig packages:
- Add sane comments to all functions/variables in componentconfig `register.go` files
- Make the `register.go` files of componentconfig pkgs follow the same pattern and not differ from each other like they do today.
- Register the `openapi-gen` tag in all `doc.go` files where the pkg contains _external_ types.
- Add the `groupName` tag where missing
- Fix cases where `addKnownTypes` was registered twice in the `SchemeBuilder`
- Add `Readme` and `OWNERS` files to `Godeps` directories if missing.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @sttts @thockin
From current AWS documentation:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html
T3, C5, C5d, M5, M5d, R5, R5d, and z1d instances support a maximum of
28 attachments, and every instance has at least one network interface
attachment. If you have no additional network interface attachments on
these instances, you could attach 27 EBS volumes.
Automatic merge from submit-queue (batch tested with PRs 67555, 68196). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Not split nodes when searching for nodes but doing it all at once
**What this PR does / why we need it**:
Not split nodes when searching for nodes but doing it all at once.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
@bsalamat
This is a follow up PR of #66733.
https://github.com/kubernetes/kubernetes/pull/66733#discussion_r205932531
**Release note**:
```release-note
Not split nodes when searching for nodes but doing it all at once.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add validation for kube-scheduler configuration options
**What this PR does / why we need it**: This adds validation to the kube-scheduler so that we're not accepting bogus values to the kube-scheduler. As requested by @bsalamat in issue https://github.com/kubernetes/kubernetes/issues/66743
**Which issue(s) this PR fixes**:
Fixes#66743
**Special notes for your reviewer**:
- Not sure if this validation is too heavy handed. Would love some feedback.
- I started working on this before I realized @islinwb was also working on this same problem... https://github.com/kubernetes/kubernetes/pull/66787 I put this PR up anyways since I'm sure good code exists in both. I wasn't aware of the /assign command so didn't assign myself before starting work.
- I didn't have time to work on adding validation to deprecated cli options. If the rest of this looks ok, I can finish that up.
- I hope the location of IsValidSocketAddr is correct. Lmk if it isn't.
**Release note**:
```release-note
Adding validation to kube-scheduler at the API level
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Affinity/Anti-Affinity Optimization of Pod Being Scheduled
**What this PR does / why we need it**:
Following #66948, it was noticed that the applied optimizations for anti-affinity rules lookup of existing pods could be further applied to checking affinity and anti-affinity terms of the Pod being scheduled. This is done by mapping topology pairs to pods that potentially match the pod being scheduled instead of mapping nodes to matching pods, and accordingly the search space is reduced.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67738
**Special notes for your reviewer**:
/sig scheduling
/sig scalability
**Release note**:
```release-note
Improve performance of Pod affinity/anti-affinity in the scheduler
```
Automatic merge from submit-queue (batch tested with PRs 63437, 68081). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Enable ImageLocalityPriority by default with integration tests
**What this PR does / why we need it**:
This PR is a follow-up to [#63842](https://github.com/kubernetes/kubernetes/issues/63842). It moves the ImageLocalityPriority function to default priority functions of the default algorithm provider and adds integration tests for the updated scheduling policy.
- Compared to [#64662](https://github.com/kubernetes/kubernetes/pull/64662), this PR does note provide e2e test due to concerns about a large image may add too much overhead to the testing infrastructure and pipeline. We should add e2e tests in the future with the use of large enough image(s) in following PRs.
- Compared to [#64662](https://github.com/kubernetes/kubernetes/pull/64662), this PR simplifies the code changes and keeps code changes under test/integration/scheduler/.
- The PR contains a bug fix for [#65745](https://github.com/kubernetes/kubernetes/pull/65745) - caught by the integration test - where the image states are not properly cloned to the scheduler's cachedNodeInfoMap. We might split this fix into a separate PR.
The integration test covers what follows: a pod requiring a large image (~= 3GB) is submitted to the cluster and there is a single node in the cluster has the same large image; the pod should get scheduled to that node. We might also consider whether more scenarios are desired.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
Kindly ping @resouer and @bsalamat
**Release note**:
```release-note
None
```
adding validation for componentconfig
adding validation to cmd kube-scheduler
Add support for ipv6 in IsValidSocketAddr function
updating copyright date in componentconfig/validation/validation.go
updating copyright date in componentconfig/validation/validation_test.go
adding validation for cli options
adding BUILD files
updating validate function to return []errors in cmd/kube-scheduler
ok, really returning []error this time
adding comments for exported componentconfig Validation functions
silly me, not checking structs along the way :'(
refactor to avoid else statement
moving policy nil check up one function
rejigging some deprecated cmd validations
stumbling my way around validation slowly but surely
updating according to review from @bsalamat
- not validating leader election config unless leader election is enabled
- leader election time values cannot be zero
- removing validation for KubeConfigFile
- removing validation for scheduler policy
leader elect options should be non-negative
adding test cases for renewDeadline and leaseDuration being zero
fixing logic in componentconfig validation 😅
removing KubeConfigFile reference from tests as it was removed in master
2ff9bd6699
removing bogus space after var assignment
adding more tests for componentconfig based on feedback
making updates to validation because types were moved on master
update bazel build
adding validation for staging/apimachinery
adding validation for staging/apiserver
adding fieldPaths for staging validations
moving staging validations out of componentconfig
updating test case scenario for staging/apimachinery
./hack/update-bazel.sh
moving kube-scheduler validations from componentconfig
./hack/update-bazel.sh
removing non-negative check for QPS
resourceLock required
adding HardPodAffinitySymmetricWeight 0-100 range to cmd flag help section
Automatic merge from submit-queue (batch tested with PRs 67766, 67642, 67772). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add unit test cases for scheduler/algorithm/predicates.
**What this PR does / why we need it**:
Add unit test cases for scheduler/algorithm/predicates for more code coverage.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66257, 67750). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add unit test cases for scheduler/util.
**What this PR does / why we need it**:
Add unit test cases for scheduler/util for more code coverage.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Complement unit test case TestNodesWherePreemptionMightHelp for scheduler/core
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix golint error under scheduler/factory.
**What this PR does / why we need it**:
Fix golint error under scheduler/factory.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Moving KubeSchedulerConfiguration from ComponentConfig API types to staging repos
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixeskubernetes/kubeadm#528
**Special notes for your reviewer**:
/cc luxas timothysc
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
**Release note**:
```release-note
Moving KubeSchedulerConfiguration from ComponentConfig API types to staging repos
```
Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use sync.map to scale equiv class cache better
**What this PR does / why we need it**:
Change the current lock in first level ecache into `sync.Map`, which is known for scaling better than `sync. Mutex ` on machines with >8 CPUs
ref: https://golang.org/pkg/sync/#Map
And the code is much cleaner in this way.
5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:
```bash
// before
BenchmarkScheduling/5000Nodes/0Pods-64 10000 17550089 ns/op
// after
BenchmarkScheduling/5000Nodes/0Pods-64 10000 16975098 ns/op
```
Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.
**Special notes for your reviewer**:
**Release note**:
```release-note
Use sync.map to scale ecache better
```
Automatic merge from submit-queue (batch tested with PRs 67041, 66948). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Anti affinity optimization
**What this PR does / why we need it**:
This pull request aims to optimize the performance of anti-affinity rules lookup of existing pods
This optimization maps the topology values to a list of pods running on nodes that match this value and store that map in the pod metadata. Accordingly, when validating anti-affinity rules of existing pods we will only check those running on nodes with similar topology values to the current candidate (node) for scheduling.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63937
**Special notes for your reviewer**:
/sig scalability
/sig scheduling
**Release note**:
```release-note
improve performance of anti-affinity predicate of default scheduler.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: add metrics to equivalence cache
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.
This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/63259
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67461, 67464, 67416). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete dead code in pkg/scheduler
**What this PR does / why we need it**:
This is just some cleanup. I found some unused code while evaluating the scheduler code.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/kind cleanup
/sig scheduling
This adds counters to equiv. cache reads & writes. Reads are labeled by
hit/miss, while writes are labeled to indicate whether the write was
discarded.
This will give us visibility into,
- hit rate of cache reads
- ratio of reads to writes
- rate of discarded writes
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add space for output
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65570, 65616). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Retry scheduling on StorageClass events
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56163
**Special notes for your reviewer**:
I have taken over #60006.
It's hard to test in e2e, because we cannot know reschedule of pod is triggered by which event (periodically service/node events will move pods to active queue too). ~~I'll add integration tests for this functionality after [this PR](https://github.com/kubernetes/kubernetes/pull/65296) get merged.~~ (already added)
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63955, 66685, 66671). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove unused code in pkg/scheduler/algorithm/scheduler_interface_test.go
**What this PR does / why we need it**:
remove unused code in pkg/scheduler/algorithm/scheduler_interface_test.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66540, 66599). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Invalidate CheckVolumeBinding predicate only when VolumeScheduling feature is enabled
**What this PR does / why we need it**:
Invalidate CheckVolumeBinding predicate only when VolumeScheduling feature is enabled.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66540, 66599). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
replace predicates string with corresponding const in TestDefaultPredicates
**What this PR does / why we need it**:
replace predicates string with corresponding const in TestDefaultPredicates. Unify with the const in func defaultPredicates().
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66291, 66471, 66499). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve unit test TestZeroRequest
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66468
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extender preemption should respect IsInterested()
**What this PR does / why we need it**:
Extender preemption should respect IsInterested()
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66289
**Special notes for your reviewer**:
The bug is reported and the first commit is co-authored by: @chenchun
**Release note**:
```release-note
Extender preemption should respect IsInterested()
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler/core)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix volume limit for EBS on m5 and c5 instances
This is a fix for lower volume limits on m5 and c5 instance types while we wait for https://github.com/kubernetes/features/issues/554 to land GA.
This problem became urgent because many of our users are trying to migrate to those instance types in light of spectre/meltdown vulnerability but lower volume limit on those instance types often causes cluster instability. Yes they can workaround by configuring the scheduler with lower limit but often this becomes somewhat difficult to do when cluster is mixed.
The newer default limits were picked from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html
Text about spectre/meltdown is available on - https://community.bitnami.com/t/spectre-variant-2/54961/5
/sig storage
/sig scheduling
```release-note
Fix volume limit for EBS on m5 and c5 instance types
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Re-design equivalence class cache to two level cache
**What this PR does / why we need it**:
The current ecache introduced a global lock across all the nodes, and this patch tried to assign ecache per node to eliminate that global lock. The improvement of scheduling performance and throughput are both significant.
**CPU Profile Result**
Machine: 32-core 60GB GCE VM
1k nodes 10k pods bench test (we've highlighted the critical function):
1. Current default scheduler with ecache enabled:

2. Current default scheduler with ecache disabled:

3. Current default scheduler with this patch and ecache enabled:

**Throughput Test Result**
1k nodes 3k pods `scheduler_perf` test:
Current default scheduler, ecache is disabled:
```bash
Minimal observed throughput for 3k pod test: 200
PASS
ok k8s.io/kubernetes/test/integration/scheduler_perf 30.091s
```
With this patch, ecache is enabled:
```bash
Minimal observed throughput for 3k pod test: 556
PASS
ok k8s.io/kubernetes/test/integration/scheduler_perf 11.119s
```
**Design and implementation:**
The idea is: we re-designed ecache into a "two level cache".
The first level cache holds the global lock across nodes and sync is needed only when node is added or deleted, which is of much lower frequency.
The second level cache is assigned per node and its lock is restricted to per node level, thus there's no need to bother the global lock during whole predicate process cycle. For more detail, please check [the original discussion](https://github.com/kubernetes/kubernetes/issues/63784#issuecomment-399848349).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63784
**Special notes for your reviewer**:
~~Tagged as WIP to make sure this does not break existing code and tests, we can start review after CI is happy.~~
**Release note**:
```release-note
Re-design equivalence class cache to two level cache
```
Automatic merge from submit-queue (batch tested with PRs 58487, 63666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler/factory)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 66011, 66111, 66106, 66039, 65745). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable adaptive scoring in ImageLocalityPriority
**What this PR does / why we need it**:
This PR replaces the original, pure image-size based scoring to an adaptive scoring scheme. The new scoring scheme considers not only the image size but also its `"spread" `- the definition of `"spread"` is described in what follows:
> Given an image`i`, `spread_i = num_node_has_i / total_num_nodes`
And the image receives the score: `score_i = size_i * spread_i`, as proposed by @resouer. The final node score is the summation of image scores for all images found existing on the node that are mentioned in the pod spec.
The goal of this heuristic is to better _balance image locality with other scheduling policies_. In particular, it aims to mitigate and prevent the undesirable "node heating problem", _i.e._, pods get assigned to the same or a few nodes due to preferred image locality. Given an image, the larger `spread` it has the more image locality we can consider for it - since we can expect more nodes having this image.
The new image state information in scheduler cache, enabled in this PR, allows other potential heuristics to be explored.
**Special notes for your reviewer**:
@resouer
Additional unit tests are WIP.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: update tests to use sub-benchmarks (pkg/scheduler/cache)
**What this PR does / why we need it**:
Go 1.7 added the subtest feature which can make table-driven tests much easier to run and debug. Some tests are not using this feature.
Further reading: [Using Subtests and Sub-benchmarks](https://blog.golang.org/subtests)
/kind cleanup
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix scheduler config decoding
Fixes#65413
Implements a custom unmarshaler for a single scheduler config type which did not correctly specify JSON tags until https://github.com/kubernetes/kubernetes/issues/65414 is resolved
Adds missing compatibility tests for scheduler extenders back to 1.7
```release-note
Fixes incompatibility with custom scheduler extender configurations specifying `bindVerb`
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Improve scheduler's performance by eliminating sorting of nodes by their score
**What this PR does / why we need it**:
Profiling scheduler, I noticed that scheduler spends a significant amount of time in sorting the nodes after we score them to find nodes with the highest score. Finding nodes with the highest score does not need sorting the array. This PR replaces the sort with a linear scan.
Eliminating the sort results in over 10% improvement in throughput of the scheduler.
Before (3 runs for 5000 nodes, scheduling 1000 pods in a cluster running 2000 pods):
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 20682552 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 20464729 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 21188906 ns/op
After:
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18485866 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18457749 ns/op
BenchmarkScheduling/5000Nodes/2000Pods-12 1000 18418200 ns/op
**Release note**:
```release-note
Improve scheduler's performance by eliminating sorting of nodes by their score.
```
Automatic merge from submit-queue (batch tested with PRs 65388, 64995). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add more conditions to the list of predicate failures that won't be resolved by preemption
**What this PR does / why we need it**:
Adds more conditions to the list of predicate failures that won't be resolved by preemption. This change can potentially improve performance of preemption by avoiding the nodes that won't be able to schedule the pending pod no matter how many other pods are removed from them.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Add more conditions to the list of predicate failures that won't be resolved by preemption.
```
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 65024, 65287, 65345, 64693, 64941). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix some typos in code comments.
Signed-off-by: xiechengsheng <XIE1995@whut.edu.cn>
**What this PR does / why we need it**:
Fix some typos in code comments.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
NONE
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Split scheduler latency metric to fine-grained steps
This splits the summary metric we recently added into finer steps. It should be very useful for performance experiments.
/cc @wojtek-t
fyi - @bsalamat @misterikkit
Strictly speaking this is a breaking change, but since this metric was added only ~week ago I think it should fine (we should port this change to 1.11).
```release-note
Split 'scheduling_latency_seconds' metric into finer steps (predicate, priority, premption)
```
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler/algorithmprovider)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
Automatic merge from submit-queue (batch tested with PRs 64285, 63660, 63661, 63662, 64883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg/scheduler/algorithm/predicates)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```
This moves the equivalence cache implementation out of the 'core'
package and into k8s.io/kubernetes/pkg/scheduler/core/equivalence.
Separating the equiv. cache from the genericScheduler implementation
make their interaction points easier to follow, and prevents us from
accidentally accessing unexported fields.
Automatic merge from submit-queue (batch tested with PRs 64882, 64692, 64389, 60626, 64840). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
update pod state of scheduler cache when UpdatePod
update pod state map in scheduler cache when call UpdatePod. @k82cn @bsalamat
```release-note
keep pod state consistent when scheduler cache UpdatePod
```
Automatic merge from submit-queue (batch tested with PRs 64142, 64426, 62910, 63942, 64548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
scheduler: further cleanup of equivalence cache
**What this PR does / why we need it**:
This improves comments and simplifies some names/logic in equivalence_cache.go, as well as changing the order of some items in the file.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/kind cleanup
Automatic merge from submit-queue (batch tested with PRs 64252, 64307, 64163, 64378, 64179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`
**What this PR does / why we need it**:
Remove unused parameter (pod) in `pkg/scheduler/core/generic_scheduler`
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65230, 57355, 59174, 63698, 63659). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use subtest for table units (pkg-scheduler-algorithm-priorities-util)
**What this PR does / why we need it**: Update scheduler's unit table tests to use subtest
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
breaks up PR: https://github.com/kubernetes/kubernetes/pull/63281
/ref #63267
**Release note**:
```release-note
This PR will leverage subtests on the existing table tests for the scheduler units.
Some refactoring of error/status messages and functions to align with new approach.
```