kubernetes/pkg/scheduler/framework
Patrick Ohly ecbafb8de5 DRA: fix scheduler/resource claim controller race
There was a race caused by having to update claim finalizer and status in two
different operations:
- Resource claim controller removes allocation, does not yet
  get to remove the finalizer.
- Scheduler prepares an allocation, without adding the finalizer
  because it's there.
- Controller removes finalizer.
- Scheduler adds allocation.

This is an invalid state. Automatic checking found this during the execution of
the "with translated parameters on single node.*supports sharing a claim
sequentially" E2E test, but only when run stand-alone. When running in
parallel (as in the CI), the bad outcome of the race did not occur.

The fix is to check that the finalizer is still set when adding the
allocation. The apiserver doesn't check that because it doesn't know which
finalizer goes with the allocation result. It could check for "some finalizer",
but that is not guaranteed to be correct (could be some unrelated one).

Checking the finalizer can only be done with a JSON patch. Despite the
complications, having the ability to add multiple pods concurrently to
ReservedFor seems worth it (avoids expensive rescheduling or a local retry
loop).

The resource claim controller doesn't need this, it can do a normal update
which implicitly checks ResourceVersion.
2024-06-27 15:03:06 +02:00
..
autoscaler_contract kube-scheduler: NewFramework function to pass the context parameter 2023-05-23 10:17:34 +08:00
parallelize Avoid metric lookup in Parallelizer.Util on every work piece 2023-03-09 17:12:30 +00:00
plugins DRA: fix scheduler/resource claim controller race 2024-06-27 15:03:06 +02:00
preemption Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable 2024-05-31 15:52:16 +00:00
runtime Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers 2024-06-25 11:56:28 -07:00
cycle_state_test.go Copy recordPluginMetrics in CycleState.Clone 2022-04-14 15:26:20 +00:00
cycle_state.go Improve docs on framework.CycleState 2023-07-18 14:48:20 +08:00
extender.go Scheduler first fit (#123384) 2024-02-26 11:07:19 -08:00
interface_test.go cleanup: remove useless test 2023-10-22 04:41:59 +00:00
interface.go Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers 2024-06-25 11:56:28 -07:00
listers.go Adding StorageInfoLister to SharedLister 2022-05-03 18:00:41 -07:00
types_test.go scheduler: fix klog.KObjSlice when applied to []*NodeInfo 2024-06-26 08:11:31 +02:00
types.go scheduler: fix klog.KObjSlice when applied to []*NodeInfo 2024-06-26 08:11:31 +02:00