Commit Graph

3123 Commits

Author SHA1 Message Date
Kensei Nakada
0d3eafdfa3 fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105)
* always put Pods with no unschedulable plugins into activeQ/backoffQ

* address review comments
2023-09-11 09:30:11 -07:00
Patrick Ohly
6f9140e421 DRA scheduler: stop allocating before deallocation
This fixes a test flake:

    [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552

      [FAILED] number of deallocations
      Expected
          <int64>: 2
      to equal
          <int64>: 1
      In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652

This can be reproduced locally with

    stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.*reallocation.works  -ginkgo.no-color -v=4 -ginkgo.v

Log output showed that the sequence of events leading to this was:
- claim gets allocated because of selected node
- a different node has to be used, so PostFilter sets
  claim.status.deallocationRequested
- the driver deallocates
- before the scheduler can react and select a different node,
  the driver allocates *again* for the original node
- the scheduler asks for deallocation again
- the driver deallocates again (causing the test failure)
- eventually the pod runs

The fix is to disable allocations first by removing the selected node and then
starting to deallocate.
2023-09-11 10:56:17 +02:00
Kubernetes Prow Robot
41689233b4 Merge pull request #120334 from pohly/scheduler-clear-unschedulable-plugins
scheduler: avoid false "unschedulable" pod state
2023-09-08 12:01:23 -07:00
Patrick Ohly
4e73634b53 scheduler: start scheduling attempt with clean UnschedulablePlugins
When some plugin was registered as "unschedulable" in some previous scheduling
attempt, it kept that attribute for a pod forever. When that plugin then later
failed with an error that requires backoff, the pod was incorrectly moved to the
"unschedulable" queue where it got stuck until the periodic flushing because
there was no event that the plugin was waiting for.

Here's an example where that happened:

     framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c"
    schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c"
    ...
    scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip"

Pop still needs the information about unschedulable plugins to update the
UnschedulableReason metric. It can reset that information before returning the
PodInfo for the next scheduling attempt.
2023-09-08 16:52:36 +02:00
Kubernetes Prow Robot
a64a3e16ec Merge pull request #120253 from pohly/dra-scheduler-podschedulingcontext-updates
dra scheduler: refactor PodSchedulingContext updates
2023-09-08 02:48:14 -07:00
Patrick Ohly
5c7dac2d77 dra scheduler: refactor PodSchedulingContext updates
Instead of modifying the PodSchedulingContext and then creating or updating it,
now the required changes (selected node, potential nodes) are tracked and the
actual input for an API call is created if (and only if) needed at the end.

This makes the code easier to read and change. In particular, replacing the
Update call with Patch or Apply is easy.
2023-09-08 08:06:06 +02:00
Kubernetes Prow Robot
2d5b6f16f5 Merge pull request #120213 from pohly/dra-scheduler-resourceclass-missing
dra: resourceclass missing
2023-09-06 23:47:09 -07:00
Patrick Ohly
c682d2b8c5 scheduler: add ResourceClass events
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot
a7f9e70384 Merge pull request #120413 from pohly/scheduler-in-flight-events-fix
scheduler: fix tracking of concurrent events
2023-09-05 15:17:03 -07:00
Patrick Ohly
c131c92b9f scheduler: unit test case for concurrent event with other pod
The problematic scenario was having one pod in flight, one event in the list,
and then detecting a concurrent event for a second pod after the first pod is
done. The new test case covers that.

To make it work without assumptions about the implementation, the QueuedPodInfo
returned by Pop must be the one passed to AddUnschedulableIfNotPresent
after (potentially) populating UnschedulablePlugins. This is done via callback
functions which bind to the same shared variable.
2023-09-05 21:01:13 +02:00
Patrick Ohly
cd943dd95e scheduler: fix tracking of concurrent events
The previous approach was based on the assumption that an in-flight pod can use
the head of the received event list as marker for identifying all events that
occur while the pod is in flight. That assumption is incorrect: when that
existing element gets removed from the list because all pods that were
in-flight when it was received are done, that marker's Next method returns nil
and the code which should have seen several concurrent events (if there were
any) missed all of those.

As a result, a pod with concurrent events could incorrectly get moved to the
unschedulable queue where it could got stuck until the next periodic purging
after 5 minutes if there was no other event for it.

The approach with maintaining a single list of concurrent events can be fixed
by inserting each in-flight pod into the list and using that element to
identify "more recent" events for the pod.
2023-09-05 19:58:38 +02:00
SataQiu
cae090e7fe scheduler: remove unused constant SchedulerPolicyConfigMapKey 2023-09-04 17:48:36 +08:00
Kubernetes Prow Robot
cd91351dff Merge pull request #117720 from kerthcet/feat/remove-selector-spread
Remove deprecated selectorSpread
2023-08-29 00:25:22 -07:00
Kubernetes Prow Robot
3e910875a7 Merge pull request #120125 from kerthcet/cleanup/write-to-cycle
Make sure skipped score plugins always returned
2023-08-28 15:13:20 -07:00
Patrick Ohly
5269e76990 scheduler: properly skip DRA events
Because of a misplaced `append` (should have been inside if clause, not after
it), some handler from a previous loop iteration was added again. This was
harmless because the resulting slice was only used for waiting for cache sync,
but should better get fixed anyway.
2023-08-28 17:55:44 +02:00
Kubernetes Prow Robot
029d518970 Merge pull request #117588 from kerthcet/cleanup/use-genericset
Avoid duplicated dots in pod status when preempting
2023-08-28 08:39:44 -07:00
kerthcet
580f83ab4a Avoid duplicated dots in pod condition
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 22:36:36 +08:00
kerthcet
855b445d28 Remove deprecated selectorSpread
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 22:11:33 +08:00
Kubernetes Prow Robot
faf1b5d655 Merge pull request #114685 from AxeZhan/dynamicresources
dynamic resource allocation: optimize class.SuitableNodes usage
2023-08-28 04:43:43 -07:00
kerthcet
3d583398fe Avoid to build the error msg for twice
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 17:13:39 +08:00
Kubernetes Prow Robot
10c622e99a Merge pull request #119994 from SataQiu/remove-scheduler-v1beta3
scheduler: remove deprecated v1beta3 KubeSchedulerConfiguration component config
2023-08-24 15:31:17 -07:00
Kubernetes Prow Robot
b910deb3a1 Merge pull request #120000 from kerthcet/cleanup/no-duplication
Remove duplicate codes in framework RemovePod
2023-08-24 04:22:20 -07:00
kerthcet
ab01848134 Make sure skip score plugins alwarys returned
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-24 13:39:47 +08:00
kerthcet
9ee94b0204 Remove duplicate codes in framework RemovePod
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-23 18:23:41 +08:00
Kubernetes Prow Robot
f852d7fead Merge pull request #118653 from pohly/volume-resource-requirements
Volume resource requirements
2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot
f082fab916 Merge pull request #119556 from linxiulei/schedMF
Trim managedFields in pod informer
2023-08-21 07:03:34 -07:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00
Kubernetes Prow Robot
ea3318cb71 Merge pull request #119971 from kwakubiney/chore/include-pod-uid-in-event-log
chore: attach pod UID to event log
2023-08-21 04:13:22 -07:00
Eric Lin
f93bd699aa Trim managedFields in pod informer
Signed-off-by: Eric Lin <exlin@google.com>
2023-08-20 13:09:15 +00:00
Kubernetes Prow Robot
312dc127a9 Merge pull request #118923 from AxeZhan/volume_zone_csi
[Scheduler]Translate beta label to ga in volume_zone
2023-08-17 20:20:28 -07:00
AxeZhan
af26ebd0fa translate beta label to ga in volume_zone 2023-08-18 00:31:09 +08:00
SataQiu
ef7d404702 using wait.PollUntilContextTimeout instead of deprecated wait.Poll for pkg/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler

using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling

using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
2023-08-17 17:25:09 +08:00
SataQiu
427b703c37 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration 2023-08-17 13:27:21 +08:00
kwakubiney
5752cbd8c7 chore: add pod UID in event log
This change includes preemptor pod UID in event log to allow
for easier debugging.

Signed-off-by: kwakubiney <kebiney@hotmail.com>
2023-08-16 11:00:56 +00:00
Kubernetes Prow Robot
130a5a423f Merge pull request #119785 from sanposhiho/waitonpermit-fiterror
fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins
2023-08-15 23:13:04 -07:00
Kubernetes Prow Robot
719d1a84f7 Merge pull request #119778 from sanposhiho/bugfix-unschedulableandunresolvable
fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
2023-08-15 23:12:57 -07:00
Kubernetes Prow Robot
57212647e9 Merge pull request #119769 from Huang-Wei/bug/prefilter-preemption
Fix a bug that PostFilter plugin may don't function if previous PreFilter plugins return Skip
2023-08-15 23:12:50 -07:00
Kubernetes Prow Robot
ea30d100f6 Merge pull request #119399 from wackxu/optimizecodeforNodeUnschedulable
Optimize the code of NodeUnschedulable to reduce TolerationsTolerateT…
2023-08-15 17:14:26 -07:00
Heba Elayoty
224087abfa Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
AxeZhan
47fec59a31 parse node selector in prefilter 2023-08-14 16:39:46 +08:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
b008223705 fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap 2023-08-12 06:58:49 +00:00
Wei Huang
765f3916c2 Fix a bug that PostFilter plugin may not function if previous PreFilter plugins return Skip 2023-08-10 13:43:00 -07:00
Kensei Nakada
050c0437e6 fix: broadcast when pod is pushed back to activeQ directly in AddUnschedulableIfNotPresent 2023-08-09 03:32:14 +00:00
Patrick Ohly
2f30fae0e8 scheduler: fix data race after binding failure
When binding has failed, `Done` gets called by
`handleBindingCycleError`. Calling it again is at best redundant and worse,
suffers from a data race:
- the `assumedPodInfo` is placed in the backoff queue
- an event causes the `Pod` pointer to get updated in it
- reading `assumedPodInfo.Pod.UID` races with that write

This race was found with`go test -race`.
2023-08-02 11:04:10 +02:00
AxeZhan
2863b3d1ab Revert "refactor: simplify RunScorePlugins for readability + performance"
This reverts commit a7eb7ed5c6.
2023-07-20 10:50:32 +08:00
Kubernetes Prow Robot
15450a3f02 Merge pull request #119318 from codefromthecrypt/CycleState-docs
Improve docs on framework.CycleState
2023-07-18 07:19:10 -07:00
wackxu
a9d26ac7c7 Optimize the code of NodeUnschedulable to reduce TolerationsTolerateTaint function calls
Signed-off-by: wackxu <xushiwei5@huawei.com>
2023-07-18 21:00:05 +08:00
Adrian Cole
89ab733760 Improve docs on framework.CycleState
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Kante Yin <kerthcet@gmail.com>
2023-07-18 14:48:20 +08:00
Kensei Nakada
c7e7eee554 feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00