Automatic merge from submit-queue (batch tested with PRs 48636, 49088, 49251, 49417, 49494)
Fix issues for local storage allocatable feature
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
This PR fixes issue #47809
Automatic merge from submit-queue
[Scheduler] Remove error since err is always nil
Signed-off-by: sakeven <jc5930@sina.cn>
**What this PR does / why we need it**:
No need to log error since err is always nil.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48262, 48805)
[Scheduler] Use const value maxPriority instead of immediate value 10
Signed-off-by: sakeven <jc5930@sina.cn>
**What this PR does / why we need it**:
Use const value maxPriority instead of immediate value 10.
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Automatic merge from submit-queue (batch tested with PRs 46865, 48661, 48598, 48658, 48614)
Fix function names in the comments
This patch fixes function and type names in the comments
in predicates.go.
**What this PR does / why we need it**:
It fixes function and type names in the comments in predicates.go.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
This does not have an issue # because it is a trivial fix.
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 47043, 48448, 47515, 48446)
Refactor slice intersection
**What this PR does / why we need it**:
In worst case, the original method is O(N^2), while current method is 3 * O(N).
I think it is better.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46926, 48468)
Added helper funcs to schedulercache.Resource.
**What this PR does / why we need it**:
Avoid duplicated code slice by helper funcs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46924
**Release note**:
```release-note-none
```
Automatic merge from submit-queue
Cleanup predicates.go
**What this PR does / why we need it**:
cleanup some comments and errors.New().
**Special notes for your reviewer**:
/cc @jayunit100
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Removes alpha feature gate for affinity annotations.
**What this PR does / why we need it**:
In 1.5 we added a backstop to support alpha affinity annotations. This PR removes that support in favor of the Beta fields per discussions.
It also serves as a precursor to some of the component config work that @ncdc has done around @mikedanese design proposal.
xref: https://github.com/kubernetes/kubernetes/pull/41617
**Special notes for your reviewer**:
**Release note**:
```
Removes alpha feature gate for pod affinity annotations.
```
/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-cluster-lifecycle-misc
Automatic merge from submit-queue (batch tested with PRs 47883, 47179, 46966, 47982, 47945)
Fix local isolation for pod requesting only overlay or scratch
**What this PR does / why we need it**:
Fix overlay resource predicates for pod with only overlay or scratch storage request.
E.g. the following pod can pass predicate even if overlay is only 512Gi.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
storage.kubernetes.io/overlay: 1024Gi
```
similarly, following pod will also pass predicate
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
emptyDir:
sizeLimit: 1024Gi
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/47798
**Special notes for your reviewer**:
**Release note**:
```release-note
```
@jingxu97 @vishh @dashpole
Automatic merge from submit-queue (batch tested with PRs 46787, 46876, 46621, 46907, 46819)
Highlight nodeSelector when checking nodeSelector for Pod.
**What this PR does / why we need it**:
Currently, we are using function name as `PodSelectorMatches` to check if `nodeSelector` matches for a Pod, it is better update the function name a bit to reflect it is checking `nodeSelector` for a Pod.
The proposal is rename `PodSelectorMatches` as `PodMatchNodeSelector`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
move labels to components which own the APIs
During the apimachinery split in 1.6, we accidentally moved several label APIs into apimachinery. They don't belong there, since the individual APIs are not general machinery concerns, but instead are the concern of particular components: most commonly the kubelet. This pull moves the labels into their owning components and out of API machinery.
@kubernetes/sig-api-machinery-misc @kubernetes/api-reviewers @kubernetes/api-approvers
@derekwaynecarr since most of these are related to the kubelet
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
Automatic merge from submit-queue (batch tested with PRs 41530, 44814, 43620, 41985)
no need check is nil, because has checked before
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Scheduler can recieve its policy configuration from a ConfigMap
**What this PR does / why we need it**: This PR adds the ability to scheduler to receive its policy configuration from a ConfigMap. Before this, scheduler could receive its policy config only from a file. The logic to watch the ConfigMap object will be added in a subsequent PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```Add the ability to the default scheduler to receive its policy configuration from a ConfigMap object.
```
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
Aggregated used ports at the NodeInfo level.
fixes#42523
```release-note
Aggregated used ports at the NodeInfo level for `PodFitsHostPorts` predicate.
```
Automatic merge from submit-queue
Fix a typo
Fix a typo
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42662, 43035, 42578, 43682)
Selector spreading - improving code readability.
**What this PR does / why we need it**:
To improve code readability in selector spreading.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42577
```release-note
```
Automatic merge from submit-queue
Add plugin/pkg/scheduler to linted packages
**What this PR does / why we need it**:
Adds plugin/pkg/scheduler to linted packages to improve style correctness.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41868
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43681, 40423, 43562, 43008, 43381)
Changes for removing deadcode in taint_tolerations
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43007
Code cleanup with some modifications and a test-case in taints and tolerations
Code cleanup with some modifications and a test-case in taints and tolerations
Removed unnecessary code from my last commit
Code cleanup with some modifications and a test-case in taints and tolerations
SUggested changes for taints_tolerations
Changes for removing deadcode in taint_tolerations
small changes again
small changes again
Small changes for clear documentation.
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
they are also removed from allocatable.
- Fixes#41861.
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Add support for statefulset spreading to the scheduler
**What this PR does / why we need it**:
The scheduler SelectorSpread priority funtion didn't have the code to spread pods of StatefulSets. This PR adds StatefulSets to the list of controllers that SelectorSpread supports.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41513
**Special notes for your reviewer**:
**Release note**:
```release-note
Add the support to the scheduler for spreading pods of StatefulSets.
```
Automatic merge from submit-queue (batch tested with PRs 41954, 40528, 41875, 41165, 41877)
Move the scheduler Fake* to testing folder
Address this issue: https://github.com/kubernetes/kubernetes/issues/41662
@timothysc I moved the interface to the types.go, and the Fake* to the testing package, not sure whether you like or not.
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)
Optimization of on-wire information sent to scheduler extender interfaces that are stateful
The changes are to address the issue described in https://github.com/kubernetes/kubernetes/issues/40991
```release-note
Added support to minimize sending verbose node information to scheduler extender by sending only node names and expecting extenders to cache the rest of the node information
```
cc @ravigadde
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)
Allow updates to pod tolerations.
Opening this PR to continue discussion for pod spec tolerations updates when a pod has been scheduled already. This PR is built on top of https://github.com/kubernetes/kubernetes/pull/38957.
@kubernetes/sig-scheduling-pr-reviews @liggitt @davidopp @derekwaynecarr @kubernetes/rh-cluster-infra
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
node cache don't need to get all the information about every
candidate node. For huge clusters, sending full node information
of all nodes in the cluster on the wire every time a pod is scheduled
is expensive. If the scheduler is capable of caching node information
along with its capabilities, sending node name alone is sufficient.
These changes provide that optimization in a backward compatible way
- removed the inadvertent signature change of Prioritize() function
- added defensive checks as suggested
- added annotation specific test case
- updated the comments in the scheduler types
- got rid of apiVersion thats unused
- using *v1.NodeList as suggested
- backing out pod annotation update related changes made in the
1st commit
- Adjusted the comments in types.go and v1/types.go as suggested
in the code review
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue
Add scheduler predicate to filter for max Azure disks attached
**What this PR does / why we need it**: This PR adds scheduler predicates for maximum Azure Disks count. This allows to use the environment variable KUBE_MAX_PD_VOLS on scheduler the same as it's already possible with GCE and AWS.
This is needed as we need a way to specify the maximum attachable disks on Azure to avoid permanently failing disk attachment in cases k8s scheduled too many PODs with AzureDisk volumes onto the same node.
I've chosen 16 as the default value for DefaultMaxAzureDiskVolumes even though it may be too high for many smaller VM types and too low for the larger VM types. This means, the default behavior may change for clusters with large VM types. For smaller VM types, the behavior will not change (it will keep failing attaching).
In the future, the value should be determined at run time on a per node basis, depending on the VM size. I know that this is already implemented in the ongoing Azure Managed Disks work, but I don't remember where to find this anymore and also forgot who was working on this. Maybe @colemickens can help here.
**Release note**:
```release-note
Support KUBE_MAX_PD_VOLS on Azure
```
CC @colemickens @brendandburns
Automatic merge from submit-queue
Adjust nodiskconflict support based on iscsi multipath.
With the multipath support is in place, to declare whether both iscsi disks are same, we need to only depend on IQN.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Automatic merge from submit-queue
Improved code coverage for plugin/pkg/scheduler/algorithm/priorities…
…/most_requested.go
**What this PR does / why we need it**:
Part of #39559 , code coverage improved from 70+% to 80+%
Automatic merge from submit-queue
Removed a space in portforward.go.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```