Since the filter status is missed for the phase of preemption, there
will be no way to tell why the preemption failed for some reasons, and
those reasons could be different with the status from the main scheduling
process (the first failed plugin will hide other failures in the chain).
This change provides verbose information based on the node status generated
during pod preemption, those information helps us to diagnose the issue which
is happened during pod preemption.
Signed-off-by: Dave Chen <dave.chen@arm.com>
in pkg/scheduler/eventhandlers.go
* add event for unscheduled pod
* delete event for unscheduled pod
* add event for scheduled pod
* delete event for scheduled pod
in pkg/scheduler/framework/plugins/defaultbinder/default_binder.go
* Attempting to bind pod to node
in pkg/scheduler/scheduler.go
* Updating pod condition
* Attempting to schedule pod
* Skip schedule deleting pod
Deprecated metrics are removed and suggest to use the Histogram
metrics got from scheduler extension points.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Co-authored-by: wawa0210 <xiaozhang0210@hotmail.com>
- Move "ForgetPod" after "RunReservePluginsUnreserve", so that the cache would hold the pod to
avoid it's being retried simutaneously until Unreserve is completed.
- Move "assume" ahead of "RunReservePluginsReserve". This is based on the fact that "ForgetPod" is
the last step of failure path, so "assume" should be reversly treated as the first step. The
current failure path is like this:
assume -> reserve -> unreserve -> forgetPod -> recordingFailure
- Make subtests of TestReservePluginUnreserve stateless
If a reserve plugin's Reserve method returns an error, there could be
previously allocated resources from successfully completed reserve
plugins that must be unallocated by the corresponding Unreserve
operation. Since Unreserve operations are idempotent, this patch runs
the Unreserve operation of ALL reserve plugins when a Reserve operation
fails.
Previously, separate interfaces were defined for Reserve and Unreserve
plugins. However, in nearly all cases, a plugin that allocates a
resource using Reserve will likely want to register itself for Unreserve
as well in order to free the allocated resource at the end of a failed
scheduling/binding cycle. Having separate plugins for Reserve and
Unreserve also adds unnecessary config toil. To that end, this patch
aims to merge the two plugins into a single interface called a
ReservePlugin that requires implementing both the Reserve and Unreserve
methods.
and e2e_scheduling_duration_seconds
Also adding result label to e2e_scheduling_duration_seconds. Previously, the metric was only updated for successful attempts
Signed-off-by: Aldo Culquicondor <acondor@google.com>
- Add a defaultpreemption PostFilter plugin
- Make g.Preempt() stateless
- make g.Preempt() stateless
- make g.getLowerPriorityNominatedPods() stateless
- make g.processPreemptionWithExtenders() stateless
- replace error with NodeToStatusMap in Preempt() signature
- eliminate podPreemptor interface and expose its functions statelessly
- move logic in scheduler.go#preempt to generic_scheduler.go#Preempt()
- rename `UpdateNominatedPodForNode` to `AddNominatedPod`
- promote `update` to `UpdateNominatedPod`
- anonymous lock in nominatedMap
- pass PodNominator as an option to NewFramework