Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
Aggregated used ports at the NodeInfo level.
fixes#42523
```release-note
Aggregated used ports at the NodeInfo level for `PodFitsHostPorts` predicate.
```
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
they are also removed from allocatable.
- Fixes#41861.
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)
Allow updates to pod tolerations.
Opening this PR to continue discussion for pod spec tolerations updates when a pod has been scheduled already. This PR is built on top of https://github.com/kubernetes/kubernetes/pull/38957.
@kubernetes/sig-scheduling-pr-reviews @liggitt @davidopp @derekwaynecarr @kubernetes/rh-cluster-infra
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue
Add scheduler predicate to filter for max Azure disks attached
**What this PR does / why we need it**: This PR adds scheduler predicates for maximum Azure Disks count. This allows to use the environment variable KUBE_MAX_PD_VOLS on scheduler the same as it's already possible with GCE and AWS.
This is needed as we need a way to specify the maximum attachable disks on Azure to avoid permanently failing disk attachment in cases k8s scheduled too many PODs with AzureDisk volumes onto the same node.
I've chosen 16 as the default value for DefaultMaxAzureDiskVolumes even though it may be too high for many smaller VM types and too low for the larger VM types. This means, the default behavior may change for clusters with large VM types. For smaller VM types, the behavior will not change (it will keep failing attaching).
In the future, the value should be determined at run time on a per node basis, depending on the VM size. I know that this is already implemented in the ongoing Azure Managed Disks work, but I don't remember where to find this anymore and also forgot who was working on this. Maybe @colemickens can help here.
**Release note**:
```release-note
Support KUBE_MAX_PD_VOLS on Azure
```
CC @colemickens @brendandburns
Automatic merge from submit-queue
Adjust nodiskconflict support based on iscsi multipath.
With the multipath support is in place, to declare whether both iscsi disks are same, we need to only depend on IQN.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
Removed a space in portforward.go.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 34543, 40606)
sync client-go and move util/workqueue
The vision of client-go is that it provides enough utilities to build a reasonable controller. It has been copying `util/workqueue`. This makes it authoritative.
@liggitt I'm getting really close to making client-go authoritative ptal.
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue
Don't require failureDomains in PodAffinityChecker
`failureDomains` are only used for `PreferredDuringScheduling` pod
anti-affinity, which is ignored by `PodAffinityChecker`.
This unnecessary requirement was making it hard to move
`PodAffinityChecker` to `GeneralPredicates` because that would require
passing `--failure-domains` to both `kubelet` and `kube-controller-manager`.
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
Minor hygiene in scheduler.
**What this PR does / why we need it**:
Minor cleanups in scheduler, related to PR #31652.
- Unified lazy opaque resource caching.
- Deleted a commented-out line of code.
**Release note**:
```release-note
N/A
```
Automatic merge from submit-queue (batch tested with PRs 39945, 39601)
bugfix for PodToleratesNodeTaints
`PodToleratesNodeTaints`predicate func should return true if pod has no toleration annotations and node's taint effect is `PreferNoSchedule`
Automatic merge from submit-queue
Avoid unnecessary memory allocations
Low-hanging fruits in saving memory allocations. During our 5000-node kubemark runs I've see this:
ControllerManager:
- 40.17% k8s.io/kubernetes/pkg/util/system.IsMasterNode
- 19.04% k8s.io/kubernetes/pkg/controller.(*PodControllerRefManager).Classify
Scheduler:
- 42.74% k8s.io/kubernetes/plugin/pkg/scheduler/algrorithm/predicates.(*MaxPDVolumeCountChecker).filterVolumes
This PR is eliminating all of those.
Automatic merge from submit-queue
Optimize pod affinity when predicate
Optimize by returning as early as possible to avoid invoking priorityutil.PodMatchesTermsNamespaceAndSelector.
Automatic merge from submit-queue
Add use case to service affinity
Also part of nits in refactoring predicates, I found the explanation of `serviceaffinity` in its comment is very hard to understand. So I added example instead here to help user/developer to digest it.
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)
Do not create selector and namespaces in a loop where possible
With 1000 nodes and 5000 pods (5 pods per node) with anti-affinity a lot of CPU wasted on creating LabelSelector and sets.String (map).
With this change we are able to deploy that number of pods in ~25 minutes. Without - it takes 30 minutes to deploy 500 pods with anti-affinity configured.
failureDomains are only used for PreferredDuringScheduling pod
anti-affinity, which is ignored by PodAffinityChecker.
This unnecessary requirement was making it hard to move
PodAffinityChecker to GeneralPredicates because that would require
passing --failure-domains to both kubelet and kube-controller-manager.