This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Automatic merge from submit-queue (batch tested with PRs 46926, 48468)
Added helper funcs to schedulercache.Resource.
**What this PR does / why we need it**:
Avoid duplicated code slice by helper funcs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#46924
**Release note**:
```release-note-none
```
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)
Aggregated used ports at the NodeInfo level.
fixes#42523
```release-note
Aggregated used ports at the NodeInfo level for `PodFitsHostPorts` predicate.
```
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
Fixed potential OutOfSync of nodeInfo.
The cloned NodeInfo still share the same resource objects in cache; it may make `requestedResource` and Pods OutOfSync, for example, if the pod was deleted, the `requestedResource` is updated by Pods are not in cloned info. Found this when investigating #32531 , but seems not the root cause, as nodeInfo are readonly in predicts & priorities.
Sample codes for `&(*)`:
```
package main
import (
"fmt"
)
type Resource struct {
A int
}
type Node struct {
Res *Resource
}
func main() {
r1 := &Resource { A:10 }
n1 := &Node{Res: r1}
r2 := &(*n1.Res)
r2.A = 11
fmt.Printf("%t, %d %d\n", r1==r2, r1, r2)
}
```
Output:
```
true, &{11} &{11}
```
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
they are also removed from allocatable.
- Fixes#41861.
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
It should reduce the resource data after finding the pod in the pods, because perhaps no corresponding pod in the pods of the node, at this time it shouldn't reduce the resource data of the node.
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs