* feature(sscheduling_queue): track events per Pods
* fix typos
* record events in one slice and make each in-flight Pod to refer it
* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods
* eliminate MakeNextPodFuncs
* call Done inside the scheduling queue
* fix comment
* implement done() not to require lock in it
* fix UTs
* improve the receivedEvents implementation based on suggestions
* call DonePod when we don't call AddUnschedulableIfNotPresent
* fix UT
* use queuehint to filter out events for in-flight Pods
* fix based on suggestion from aldo
* fix based on suggestion from Wei
* rename lastEventBefore → previousEvent
* fix based on suggestion
* address comments from aldo
* fix based on the suggestion from Abdullah
* gate in-flight Pods logic by the SchedulingQueueHints feature gate
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.
Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.
The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.
What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
deleted namespaces.
* Skip terminal Pods with a deletion timestamp from the Daemonset sync
Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742
* Review comments and fix integration test
Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713
* Include deleted terminal pods in history
Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.
The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.
Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
PVCs using the ReadWriteOncePod access mode can only be referenced by a
single pod. When a pod is scheduled that uses a ReadWriteOncePod PVC,
return "Unschedulable" if the PVC is already in-use in the cluster.
To support preemption, the "VolumeRestrictions" scheduler plugin
computes cycle state during the PreFilter phase. This cycle state
contains the number of references to the ReadWriteOncePod PVCs used by
the pod-to-be-scheduled.
During scheduler simulation (AddPod and RemovePod), we add and remove
reference counts from the cycle state if they use any of these
ReadWriteOncePod PVCs.
In the Filter phase, the scheduler checks if there are any PVC reference
conflicts, and returns "Unschedulable" if there is a conflict.
This is a required feature for the ReadWriteOncePod beta. See for more context:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode#beta
Tests scheduler enforcement of the ReadWriteOncePod PVC access mode.
- Creates a pod using a PVC with ReadWriteOncePod
- Creates a second pod using the same PVC
- Observes the second pod fails to schedule because PVC is in-use
- Deletes the first pod
- Observes the second pod successfully schedules
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>