The implementation consists of
- identifying all places where VolumeSource.PersistentVolumeClaim has
a special meaning and then ensuring that the same code path is taken
for an ephemeral volume, with the ownership check
- adding a controller that produces the PVCs for each embedded
VolumeSource.EphemeralVolume
- relaxing the PVC protection controller such that it removes
the finalizer already before the pod is deleted (only
if the GenericEphemeralVolume feature is enabled): this is
needed to break a cycle where foreground deletion of the pod
blocks on removing the PVC, which waits for deletion of the pod
The controller was derived from the endpointslices controller.
And give ownership to pkg/scheduler/framework/plugins/volumebinding
Signed-off-by: Aldo Culquicondor <acondor@google.com>
Change-Id: I4bd89b1745a2be0e458601056ab905bdd6692195
If no potential victims could be found, there is no need to evaluate the node
again, since its state didn't change.
It's safe to return and thus prevent scheduling from running the filter plugins
again.
NOTE:
A node that is filtered out by filter plugins could pass the filter plugins if
there is a change on that node, i.e. pods termination on that node.
Previously, this could be either caught by the normal `schedule` or `preempt` (pods
are terminated when the preemption logic tries to find the nodes and re-evaluate
the filter plugins.)
Actually, this shouldn't be taken care by the preemption, consider the routine
of `schedule` is always running when the interval is "zero", let `schedule`
take care of it will release `preempt` from something irrelevant with the `preemption`.
Due to above reason, couple of testcase as well as the logic of checking the existence
of victim pods are removed as it will never happen after the change.
Signed-off-by: Dave Chen <dave.chen@arm.com>
This uses the information provided by a CSI driver deployment for
checking whether a node has access to enough storage to create the
currently unbound volumes, if the CSI driver opts into that checking
with CSIDriver.Spec.VolumeCapacity != false.
This resolves a TODO from commit 95b530366a.
node's labels doesn't contain the required topologyKeys in `Constraints`
cannot be resolved by preempting the pods on that pods.
One use case that could easily reproduce the issue is,
- set `alwaysCheckAllPredicates` to true.
- one node contains all the required topologyKeys but is failed in predicates
such as 'taint'.
- another node doesn't hold all the required topologyKeys, and thus return `Unschedulable`
status code.
- scheduler will try to preempt the pods on the above node with lower priorities.
Signed-off-by: Dave Chen <dave.chen@arm.com>
If a reserve plugin's Reserve method returns an error, there could be
previously allocated resources from successfully completed reserve
plugins that must be unallocated by the corresponding Unreserve
operation. Since Unreserve operations are idempotent, this patch runs
the Unreserve operation of ALL reserve plugins when a Reserve operation
fails.
Refactor genericScheduler and signature of preemption funcs
- remove podNominator from genericScheduler
- simplify signature of preemption functions
Make Preempt() private
Previously, separate interfaces were defined for Reserve and Unreserve
plugins. However, in nearly all cases, a plugin that allocates a
resource using Reserve will likely want to register itself for Unreserve
as well in order to free the allocated resource at the end of a failed
scheduling/binding cycle. Having separate plugins for Reserve and
Unreserve also adds unnecessary config toil. To that end, this patch
aims to merge the two plugins into a single interface called a
ReservePlugin that requires implementing both the Reserve and Unreserve
methods.
- Add a defaultpreemption PostFilter plugin
- Make g.Preempt() stateless
- make g.Preempt() stateless
- make g.getLowerPriorityNominatedPods() stateless
- make g.processPreemptionWithExtenders() stateless
`DefaultPodTopologySpread` need't score when the `TopologySpreadConstraints`
is specified.
`PreScore` needn't do this as well, this cut off the cost of `PreScore` if
possible.
Signed-off-by: Dave Chen <dave.chen@arm.com>