- Revert "Fix integration test flake on TestFilter and TestPostFilter"
This reverts commit 94fc18c2dc.
- Relax checking logic on expected Filter/PostFilter counters.
- Move "ForgetPod" after "RunReservePluginsUnreserve", so that the cache would hold the pod to
avoid it's being retried simutaneously until Unreserve is completed.
- Move "assume" ahead of "RunReservePluginsReserve". This is based on the fact that "ForgetPod" is
the last step of failure path, so "assume" should be reversly treated as the first step. The
current failure path is like this:
assume -> reserve -> unreserve -> forgetPod -> recordingFailure
- Make subtests of TestReservePluginUnreserve stateless
Using NodeWrapper in the integration tests gives more flexibility when
creating nodes. For instance, tests can create nodes with labels or
with a specific sets of resources.
Also, NodeWrapper initialises a node with a capacity of 32 pods, which
can be overridden by the caller. This makes sure that a node is usable
as soon as it is created.
If a reserve plugin's Reserve method returns an error, there could be
previously allocated resources from successfully completed reserve
plugins that must be unallocated by the corresponding Unreserve
operation. Since Unreserve operations are idempotent, this patch runs
the Unreserve operation of ALL reserve plugins when a Reserve operation
fails.
Previously, separate interfaces were defined for Reserve and Unreserve
plugins. However, in nearly all cases, a plugin that allocates a
resource using Reserve will likely want to register itself for Unreserve
as well in order to free the allocated resource at the end of a failed
scheduling/binding cycle. Having separate plugins for Reserve and
Unreserve also adds unnecessary config toil. To that end, this patch
aims to merge the two plugins into a single interface called a
ReservePlugin that requires implementing both the Reserve and Unreserve
methods.
- use in-cache Pod instead of real-time Pod (by calling API server) to mark it as unschedulable
in internal schedulingQ
- remove the backoff logic as now we don't call API server
- the whole logic is changed to a synchronous call
In case two or more controllers share the informers created through InitTestScheduler,
it's not safe to start the informers until all controllers set their informer
indexers. Otherwise, some controller might fail to register their indexers
in time. Thus, it's responsibility of each consumer to make sure all informers
are started after all controllers had time to get initiliazed.
After moving Permit() to the scheduling cycle test PermitPlugin should
no longer wait inside Permit() for another pod to enter Permit() and become waiting pod.
In the past this was a way to make test work regardless of order in
which pods enter Permit(), but now only one Permit() can be executed at
any given moment and waiting for another pod to enter Permit() inside
Permit() leads to timeouts.
In this change waitAndRejectPermit and waitAndAllowPermit flags make first
pod to enter Permit() a waiting pod and second pod to enter Permit()
either rejecting or allowing pod.
Mentioned in #88469