Commit Graph

75 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
919d4624a0
Merge pull request #122503 from sunbinnnnn/scheduler-extender-support-ignore-bind
Support ignore scheduler extender error when binding
2024-01-08 17:30:44 +01:00
Neil Sun
87816ffb2c Support ignore scheduler extender error when binding
Signed-off-by: sunbinnnnn <sunbinnnnn@hotmail.com>
2024-01-08 21:06:25 +08:00
Kensei Nakada
09abd6be5a address reviews 2024-01-02 02:10:41 +00:00
Kensei Nakada
041efcd1d4 scheduler: update an old comment 2023-12-22 02:01:13 +00:00
Aleksandra Malinowska
f89c744b7b Only run Prioritize() for extenders with prioritizeVerb configured 2023-12-21 13:42:27 +01:00
Aleksandra Malinowska
e19be41f58 Don't evaluate extra nodes if there's no score plugin defined 2023-12-21 13:29:46 +01:00
AxeZhan
be48c93689 Sched framework: expose NodeInfo in all functions of PluginsRunner interface 2023-12-15 11:30:06 +08:00
Paco Xu
1160521a4f
Revert "Scheduler first fit" 2023-12-14 17:27:25 +08:00
Kubernetes Prow Robot
517091cdc5
Merge pull request #122058 from aleksandra-malinowska/scheduler-first-fit
Scheduler first fit
2023-12-14 05:10:19 +01:00
Kubernetes Prow Robot
5322af7f9e
Merge pull request #122022 from sanposhiho/extender-fix
fix: requeue pods rejected by Extenders properly
2023-12-14 05:10:01 +01:00
Kubernetes Prow Robot
6bd8f96f35
Merge pull request #122001 from olderTaoist/scheduler-metric
report scheduling_algorithm_duration_seconds metric when pods is unschedulable
2023-12-14 05:09:25 +01:00
Toru Komatsu
01916625da
Remove unnecessary error catch in scheduling failure (#121981)
* Deleted from the cache in the handling of scheduling failures due to missing Node

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Support only `nodes`

* Remove unnecessary error catch

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

---------

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-12-14 05:09:08 +01:00
olderTaoist
78b4ab11d5 also report scheduling_algorithm_duration_seconds metric when the pods is unschedulable 2023-12-06 19:17:03 +08:00
Aleksandra Malinowska
3df00d1bdd Only run Prioritize() for extenders with prioritizeVerb configured 2023-11-28 17:13:13 +01:00
Aleksandra Malinowska
199dc03bdd Don't evaluate extra nodes if there's no score plugin defined 2023-11-28 10:39:49 +01:00
Kensei Nakada
468e2dac81 fix: requeue pods rejected by Extenders properly 2023-11-23 13:20:02 +00:00
Patrick Ohly
2a23061f6c scheduler: fix performance regression at -v3 + contextual logging
The logging instrumentation for contextual logging that was added for 1.29
slowed down the scheduler (i.e. logging verbosity <= 3) by a significant
percentage (-28.66% for SchedulingBasic/5000Nodes at -v3) if (and only if!)
contextual logging was enabled.

Retrieving the logger from the context causes no measurable slowdown, it's only
the various WithName/WithValues calls which cause this.

By being more careful about when to use those, the performance impact can be
avoided:
- At -v3 or lower, only `WithValues("pod")` is used once per scheduling cycle.
  This has the intended effect that all log messages for the cycle include the
  pod information. Once contextual logging is GA, "pod" key/value pairs can
  be removed from all log calls.
- At -v4 or higher, richer log entries get produced where `WithValues` is also
  used for the node (when applicable) and `WithName` is used for the current
  operation and plugin.

With these changes, enabling contextual logging causes no measurable slowdown
at -v3 or lower. At -v4, the slowdown depends on the test case (-30.51%
throughput for SchedulingBasic/5000Nodes, no change for
SchedulingCSIPVs/5000Nodes). For some unknown reason (measuring bias?),
SchedulingCSIPVs/500Nodes has a ~3& *higher* throughput with contextual
logging.
2023-11-03 17:28:55 +01:00
Kubernetes Prow Robot
fd5c406112
Merge pull request #120933 from mengjiao-liu/contextual-logging-scheduler-remaining-part
kube-scheduler: convert the remaining part to use contextual logging
2023-10-27 10:30:58 +02:00
Kensei Nakada
27bb66fd7b cleanup: rename failedPlugin to plugin in framework.Status 2023-10-25 12:03:56 +00:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kensei Nakada
4f5bc7e8d7 fix based on reviews 2023-10-20 02:53:06 +00:00
Kensei Nakada
cb5dc46edf feature(scheduler): simplify QueueingHint by introducing new statuses 2023-10-19 11:02:11 +00:00
Kubernetes Prow Robot
130a5a423f
Merge pull request #119785 from sanposhiho/waitonpermit-fiterror
fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins
2023-08-15 23:13:04 -07:00
Kubernetes Prow Robot
719d1a84f7
Merge pull request #119778 from sanposhiho/bugfix-unschedulableandunresolvable
fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
2023-08-15 23:12:57 -07:00
Heba Elayoty
224087abfa
Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
b008223705 fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap 2023-08-12 06:58:49 +00:00
Patrick Ohly
2f30fae0e8 scheduler: fix data race after binding failure
When binding has failed, `Done` gets called by
`handleBindingCycleError`. Calling it again is at best redundant and worse,
suffers from a data race:
- the `assumedPodInfo` is placed in the backoff queue
- an event causes the `Pod` pointer to get updated in it
- reading `assumedPodInfo.Pod.UID` races with that write

This race was found with`go test -race`.
2023-08-02 11:04:10 +02:00
Kensei Nakada
c7e7eee554
feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00
kerthcet
c0eb0caf4a Support fine-gained rescheduling in ReservePlugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 13:30:29 +08:00
kerthcet
278a8376e1 Fix: fiterror in permit plugin not handled perfectly
We only added failed plulgins, but actually this will not work unless
we make the status with a fitError because we only copy the failured plugins
to podInfo if it is a fitError

Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 10:35:59 +08:00
Kubernetes Prow Robot
d9714078f8
Merge pull request #118551 from sanposhiho/event-to-register
feature(scheduler): implement ClusterEventWithHint to filter out useless events
2023-06-26 06:41:45 -07:00
Kensei Nakada
6f8d38406a feature(scheduler): implement ClusterEventWithHint to filter out useless events 2023-06-22 13:36:19 +00:00
Heba Elayoty
902c711fb4
Unset gated pod info timestamp in addToActiveQ
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-21 14:16:08 -07:00
Kubernetes Prow Robot
d58492b19c
Merge pull request #114688 from sanposhiho/sanposhiho/scheduling-one-score
feature(schedule_one): use heap to find the highest score node
2023-06-08 15:40:12 -07:00
Mengjiao Liu
074900e81b scheduler: update the scheduler interface and cache methods to use contextual logging 2023-05-29 13:26:32 +08:00
Kensei Nakada
0535e74224 feature(schedule_one): use heap to find the highest score node 2023-05-27 11:34:32 +00:00
kerthcet
7be3f8e43f Remove old metric scheduler_goroutines
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-05-03 00:35:38 +08:00
sarab
8d18ae6fc2 Use the generic Set in scheduler 2023-04-09 11:34:17 +05:30
Kensei Nakada
639007b28e cleanup(scheduler): move metric labels to metrics package 2023-03-12 05:10:29 +00:00
Kubernetes Prow Robot
70c28f3e12
Merge pull request #114486 from kerthcet/cleanup/make-preemption-more-readable
Make handling scheduleResult more readable
2022-12-21 15:01:25 -08:00
Kante Yin
c8908716ee Make handling scheduleResult more readable
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2022-12-22 01:22:34 +08:00
kidddddddddddddddddddddd
e789beb213 errMsg 2022-12-19 23:52:06 +08:00
Kubernetes Prow Robot
dc1e77143f
Merge pull request #114082 from kidddddddddddddddddddddd/refactor_handleSchedulingFailure
pass status to handleSchedulingFailure
2022-12-12 22:05:34 -08:00
kidddddddddddddddddddddd
6ca62eb2cb refactor 2022-12-13 11:36:12 +08:00
Kubernetes Prow Robot
2e3055863d
Merge pull request #113456 from sanposhiho/use-totalscore-in-NodePluginScores
use TotalScore summarized in NodePluginScores
2022-12-12 09:01:45 -08:00
Kensei Nakada
9fd15f1fa3 use TotalScore summarized in NodePluginScores 2022-12-12 11:43:22 +00:00
Aldo Culquicondor
4e1c3a5855
Dedup serialization of status
Change-Id: Iaba63ea31e948933e162b3148cda2588af0fdaa3
2022-11-30 13:05:07 -05:00
Kubernetes Prow Robot
18b81513b6
Merge pull request #112025 from kerthcet/refactor/handle-scheduling-failure
Refactor schedulingCycle and bindingCycle in scheduler
2022-10-21 08:31:51 -07:00
kerthcet
f7f857814f Refactor schedulingCycle and bindingCycle in scheduler
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-10-21 13:53:18 +08:00