kubernetes

Files

Patrick Ohly ecbafb8de5 DRA: fix scheduler/resource claim controller race

There was a race caused by having to update claim finalizer and status in two
different operations:
- Resource claim controller removes allocation, does not yet
  get to remove the finalizer.
- Scheduler prepares an allocation, without adding the finalizer
  because it's there.
- Controller removes finalizer.
- Scheduler adds allocation.

This is an invalid state. Automatic checking found this during the execution of
the "with translated parameters on single node.*supports sharing a claim
sequentially" E2E test, but only when run stand-alone. When running in
parallel (as in the CI), the bad outcome of the race did not occur.

The fix is to check that the finalizer is still set when adding the
allocation. The apiserver doesn't check that because it doesn't know which
finalizer goes with the allocation result. It could check for "some finalizer",
but that is not guaranteed to be correct (could be some unrelated one).

Checking the finalizer can only be done with a JSON patch. Despite the
complications, having the ability to add multiple pods concurrently to
ReservedFor seems worth it (avoids expensive rescheduling or a local retry
loop).

The resource claim controller doesn't need this, it can do a normal update
which implicitly checks ResourceVersion.

2024-06-27 15:03:06 +02:00

autoscaler_contract

kube-scheduler: NewFramework function to pass the context parameter

2023-05-23 10:17:34 +08:00

parallelize

Avoid metric lookup in Parallelizer.Util on every work piece

2023-03-09 17:12:30 +00:00

plugins

DRA: fix scheduler/resource claim controller race

2024-06-27 15:03:06 +02:00

preemption

Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable