Commit Graph

68 Commits

Author SHA1 Message Date
Paco Xu
1160521a4f
Revert "Scheduler first fit" 2023-12-14 17:27:25 +08:00
Kubernetes Prow Robot
517091cdc5
Merge pull request #122058 from aleksandra-malinowska/scheduler-first-fit
Scheduler first fit
2023-12-14 05:10:19 +01:00
Kubernetes Prow Robot
5322af7f9e
Merge pull request #122022 from sanposhiho/extender-fix
fix: requeue pods rejected by Extenders properly
2023-12-14 05:10:01 +01:00
Kubernetes Prow Robot
6bd8f96f35
Merge pull request #122001 from olderTaoist/scheduler-metric
report scheduling_algorithm_duration_seconds metric when pods is unschedulable
2023-12-14 05:09:25 +01:00
Toru Komatsu
01916625da
Remove unnecessary error catch in scheduling failure (#121981)
* Deleted from the cache in the handling of scheduling failures due to missing Node

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Support only `nodes`

* Remove unnecessary error catch

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

---------

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-12-14 05:09:08 +01:00
olderTaoist
78b4ab11d5 also report scheduling_algorithm_duration_seconds metric when the pods is unschedulable 2023-12-06 19:17:03 +08:00
Aleksandra Malinowska
3df00d1bdd Only run Prioritize() for extenders with prioritizeVerb configured 2023-11-28 17:13:13 +01:00
Aleksandra Malinowska
199dc03bdd Don't evaluate extra nodes if there's no score plugin defined 2023-11-28 10:39:49 +01:00
Kensei Nakada
468e2dac81 fix: requeue pods rejected by Extenders properly 2023-11-23 13:20:02 +00:00
Patrick Ohly
2a23061f6c scheduler: fix performance regression at -v3 + contextual logging
The logging instrumentation for contextual logging that was added for 1.29
slowed down the scheduler (i.e. logging verbosity <= 3) by a significant
percentage (-28.66% for SchedulingBasic/5000Nodes at -v3) if (and only if!)
contextual logging was enabled.

Retrieving the logger from the context causes no measurable slowdown, it's only
the various WithName/WithValues calls which cause this.

By being more careful about when to use those, the performance impact can be
avoided:
- At -v3 or lower, only `WithValues("pod")` is used once per scheduling cycle.
  This has the intended effect that all log messages for the cycle include the
  pod information. Once contextual logging is GA, "pod" key/value pairs can
  be removed from all log calls.
- At -v4 or higher, richer log entries get produced where `WithValues` is also
  used for the node (when applicable) and `WithName` is used for the current
  operation and plugin.

With these changes, enabling contextual logging causes no measurable slowdown
at -v3 or lower. At -v4, the slowdown depends on the test case (-30.51%
throughput for SchedulingBasic/5000Nodes, no change for
SchedulingCSIPVs/5000Nodes). For some unknown reason (measuring bias?),
SchedulingCSIPVs/500Nodes has a ~3& *higher* throughput with contextual
logging.
2023-11-03 17:28:55 +01:00
Kubernetes Prow Robot
fd5c406112
Merge pull request #120933 from mengjiao-liu/contextual-logging-scheduler-remaining-part
kube-scheduler: convert the remaining part to use contextual logging
2023-10-27 10:30:58 +02:00
Kensei Nakada
27bb66fd7b cleanup: rename failedPlugin to plugin in framework.Status 2023-10-25 12:03:56 +00:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kensei Nakada
4f5bc7e8d7 fix based on reviews 2023-10-20 02:53:06 +00:00
Kensei Nakada
cb5dc46edf feature(scheduler): simplify QueueingHint by introducing new statuses 2023-10-19 11:02:11 +00:00
Kubernetes Prow Robot
130a5a423f
Merge pull request #119785 from sanposhiho/waitonpermit-fiterror
fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins
2023-08-15 23:13:04 -07:00
Kubernetes Prow Robot
719d1a84f7
Merge pull request #119778 from sanposhiho/bugfix-unschedulableandunresolvable
fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
2023-08-15 23:12:57 -07:00
Heba Elayoty
224087abfa
Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
b008223705 fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap 2023-08-12 06:58:49 +00:00
Patrick Ohly
2f30fae0e8 scheduler: fix data race after binding failure
When binding has failed, `Done` gets called by
`handleBindingCycleError`. Calling it again is at best redundant and worse,
suffers from a data race:
- the `assumedPodInfo` is placed in the backoff queue
- an event causes the `Pod` pointer to get updated in it
- reading `assumedPodInfo.Pod.UID` races with that write

This race was found with`go test -race`.
2023-08-02 11:04:10 +02:00
Kensei Nakada
c7e7eee554
feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00
kerthcet
c0eb0caf4a Support fine-gained rescheduling in ReservePlugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 13:30:29 +08:00
kerthcet
278a8376e1 Fix: fiterror in permit plugin not handled perfectly
We only added failed plulgins, but actually this will not work unless
we make the status with a fitError because we only copy the failured plugins
to podInfo if it is a fitError

Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-07-07 10:35:59 +08:00
Kubernetes Prow Robot
d9714078f8
Merge pull request #118551 from sanposhiho/event-to-register
feature(scheduler): implement ClusterEventWithHint to filter out useless events
2023-06-26 06:41:45 -07:00
Kensei Nakada
6f8d38406a feature(scheduler): implement ClusterEventWithHint to filter out useless events 2023-06-22 13:36:19 +00:00
Heba Elayoty
902c711fb4
Unset gated pod info timestamp in addToActiveQ
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-21 14:16:08 -07:00
Kubernetes Prow Robot
d58492b19c
Merge pull request #114688 from sanposhiho/sanposhiho/scheduling-one-score
feature(schedule_one): use heap to find the highest score node
2023-06-08 15:40:12 -07:00
Mengjiao Liu
074900e81b scheduler: update the scheduler interface and cache methods to use contextual logging 2023-05-29 13:26:32 +08:00
Kensei Nakada
0535e74224 feature(schedule_one): use heap to find the highest score node 2023-05-27 11:34:32 +00:00
kerthcet
7be3f8e43f Remove old metric scheduler_goroutines
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-05-03 00:35:38 +08:00
sarab
8d18ae6fc2 Use the generic Set in scheduler 2023-04-09 11:34:17 +05:30
Kensei Nakada
639007b28e cleanup(scheduler): move metric labels to metrics package 2023-03-12 05:10:29 +00:00
Kubernetes Prow Robot
70c28f3e12
Merge pull request #114486 from kerthcet/cleanup/make-preemption-more-readable
Make handling scheduleResult more readable
2022-12-21 15:01:25 -08:00
Kante Yin
c8908716ee Make handling scheduleResult more readable
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2022-12-22 01:22:34 +08:00
kidddddddddddddddddddddd
e789beb213 errMsg 2022-12-19 23:52:06 +08:00
Kubernetes Prow Robot
dc1e77143f
Merge pull request #114082 from kidddddddddddddddddddddd/refactor_handleSchedulingFailure
pass status to handleSchedulingFailure
2022-12-12 22:05:34 -08:00
kidddddddddddddddddddddd
6ca62eb2cb refactor 2022-12-13 11:36:12 +08:00
Kubernetes Prow Robot
2e3055863d
Merge pull request #113456 from sanposhiho/use-totalscore-in-NodePluginScores
use TotalScore summarized in NodePluginScores
2022-12-12 09:01:45 -08:00
Kensei Nakada
9fd15f1fa3 use TotalScore summarized in NodePluginScores 2022-12-12 11:43:22 +00:00
Aldo Culquicondor
4e1c3a5855
Dedup serialization of status
Change-Id: Iaba63ea31e948933e162b3148cda2588af0fdaa3
2022-11-30 13:05:07 -05:00
Kubernetes Prow Robot
18b81513b6
Merge pull request #112025 from kerthcet/refactor/handle-scheduling-failure
Refactor schedulingCycle and bindingCycle in scheduler
2022-10-21 08:31:51 -07:00
kerthcet
f7f857814f Refactor schedulingCycle and bindingCycle in scheduler
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-10-21 13:53:18 +08:00
Kubernetes Prow Robot
8305358630
Merge pull request #108494 from sanposhiho/RunScorePlugins-return-type
change framework.RunScorePlugins to return slice organized by node
2022-10-20 08:57:02 -07:00
sanposhiho
cbf1ea5e68 change framework.RunScorePlugins to return slice organized by node 2022-10-20 01:04:38 +00:00
Kubernetes Prow Robot
2b6abb1b33
Merge pull request #113113 from kerthcet/fix/metics-in-scheduler
Fix metrics time durations in schedulerCycle and bindingCycle
2022-10-17 19:53:17 -07:00
kerthcet
1582c42e2b Fix metrics time durations in schedulerCycle and bindingCycle
Signed-off-by: kerthcet <kerthcet@gmail.com>
2022-10-17 23:24:24 +08:00
Yuan Chen
7297f48f12 Add profile level percentageOfNodesToScore
Fix conversion errors

Changed the order

update

update

fix manaul coversions

keep the global parameter for backward compatibility

Address Wei's comments

Fix an error

Fix issues

Add unit tests for validation

Fix a comment

Address comments

Update comments

fix verifiation errors

Add tests for scheme_test.go

Convert percentageOfNodesToScore to pointer

Fix errors

Resolve conflicts

Fix testing errors

Address Wei's comments

Revert IntPtr to Int changes

Address comments

Not overrite percentageOfNodesToScore

Fix a bug

Fix a bug

change errs to err

Fix a nit

Remove duplication

Address comments

Fix lint warning

Fix an issue

Update comments

Clean up

Address comments

Revert changes to defaults

fix unit test error

Update

Fix tests

Use default PluginConfigs
2022-10-14 13:01:06 -07:00
kidddddddddddddddddddddd
121d24cfc7 changes in non-test files 2022-10-12 21:09:55 +08:00
Kubernetes Prow Robot
c5f795c8bf
Merge pull request #112222 from astraw99/fix-scheduler-misc
Update some scheduler misc
2022-09-14 18:37:22 -07:00