Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add admission handler for device resources allocation
**What this PR does / why we need it**:
Add admission handler for device resources allocation to fail fast during pod creation
**Which issue this PR fixes**
fixes#51592
**Special notes for your reviewer**:
@jiayingz Sorry, there is something wrong with my branch in #51895. And I think the existing comments in the PR might be too long for others to view. So I closed it and opened the new one, as we have basically reach an agreement on the implement :)
I have covered the functionality and unit test part here, and would set about the e2e part ASAP
/cc @jiayingz @vishh @RenaudWasTaken
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 53978, 54008, 53037). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change scheduler to skip pod with updates only on pod annotations
Fixes#52914, by checking whether the pod is already assumed before scheduling it.
**Release note**:
```
Scheduler cache ignores updates to an assumed pod if updates are limited to pod annotations.
```
/sig scheduling
/assign @bsalamat
/cc @vishh
Automatic merge from submit-queue
Add pod preemption to the scheduler
**What this PR does / why we need it**:
This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48646
**Special notes for your reviewer**:
This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627afdf9.
**Release note**:
```release-note
Add priority-based preemption to the scheduler.
```
ref/ #47604
/assign @davidopp
@kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue
Add support to modify precomputed predicate metadata upon adding/removal of a pod
**What this PR does / why we need it**: This PR adds capability to change precomputed predicate metadata and let's us add/remove pods to the precomputed metadata efficiently without the need ot recomputing everything upon addition/removal of pods. This PR is needed as a part of adding preemption logic to the scheduler.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
To make the review process a bit easier, there are three commits. The cleanup commit is only moving code and renaming some functions, without logic changes.
**Release note**:
```release-note
NONE
```
ref/ #47604
ref/ #48646
/assign @wojtek-t
@kubernetes/sig-scheduling-pr-reviews @davidopp
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Automatic merge from submit-queue (batch tested with PRs 46926, 48468)
Added helper funcs to schedulercache.Resource.
**What this PR does / why we need it**:
Avoid duplicated code slice by helper funcs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46924
**Release note**:
```release-note-none
```
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 43900, 44152, 44324)
Fix: check "ok" first to avoid panic
Check "ok" and then check if "currState.pod.Spec.NodeName != pod.Spec.NodeName", here if currState is nil, it will panic.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
Aggregated used ports at the NodeInfo level.
fixes#42523
```release-note
Aggregated used ports at the NodeInfo level for `PodFitsHostPorts` predicate.
```
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
Fixed potential OutOfSync of nodeInfo.
The cloned NodeInfo still share the same resource objects in cache; it may make `requestedResource` and Pods OutOfSync, for example, if the pod was deleted, the `requestedResource` is updated by Pods are not in cloned info. Found this when investigating #32531 , but seems not the root cause, as nodeInfo are readonly in predicts & priorities.
Sample codes for `&(*)`:
```
package main
import (
"fmt"
)
type Resource struct {
A int
}
type Node struct {
Res *Resource
}
func main() {
r1 := &Resource { A:10 }
n1 := &Node{Res: r1}
r2 := &(*n1.Res)
r2.A = 11
fmt.Printf("%t, %d %d\n", r1==r2, r1, r2)
}
```
Output:
```
true, &{11} &{11}
```
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
they are also removed from allocatable.
- Fixes#41861.