Commit Graph

1383 Commits

Author SHA1 Message Date
lianghao208
34e620d18c Support score extension function in preemption. 2023-11-15 15:28:20 +08:00
Kubernetes Prow Robot
5ce0bd95cc Merge pull request #121677 from kerthcet/cleanup/remove-evnet
Unregister events in schedulingGates for performance
2023-11-10 05:03:33 +01:00
kerthcet
f77a4543d1 Unregister events in schedulingGates plugin
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-11-06 10:01:13 +08:00
Patrick Ohly
2a23061f6c scheduler: fix performance regression at -v3 + contextual logging
The logging instrumentation for contextual logging that was added for 1.29
slowed down the scheduler (i.e. logging verbosity <= 3) by a significant
percentage (-28.66% for SchedulingBasic/5000Nodes at -v3) if (and only if!)
contextual logging was enabled.

Retrieving the logger from the context causes no measurable slowdown, it's only
the various WithName/WithValues calls which cause this.

By being more careful about when to use those, the performance impact can be
avoided:
- At -v3 or lower, only `WithValues("pod")` is used once per scheduling cycle.
  This has the intended effect that all log messages for the cycle include the
  pod information. Once contextual logging is GA, "pod" key/value pairs can
  be removed from all log calls.
- At -v4 or higher, richer log entries get produced where `WithValues` is also
  used for the node (when applicable) and `WithName` is used for the current
  operation and plugin.

With these changes, enabling contextual logging causes no measurable slowdown
at -v3 or lower. At -v4, the slowdown depends on the test case (-30.51%
throughput for SchedulingBasic/5000Nodes, no change for
SchedulingCSIPVs/5000Nodes). For some unknown reason (measuring bias?),
SchedulingCSIPVs/500Nodes has a ~3& *higher* throughput with contextual
logging.
2023-11-03 17:28:55 +01:00
kerthcet
5bf63036c7 Make EnablePodSchedulingReadiness public
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-11-03 11:44:56 +08:00
Kubernetes Prow Robot
d84ee0ba69 Merge pull request #121632 from kerthcet/fix/runscoreplugins
Fix panic when process RunScorePlugins for cap out of range
2023-10-31 13:14:32 +01:00
kerthcet
b02aad42fa Fix panic when process RunScorePlugins for cap out of range
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-10-31 16:02:16 +08:00
Kensei Nakada
c7842d9c63 narrow down the scope of EnqueueExtensions to subscribe less cluster events 2023-10-27 14:14:37 +00:00
Kubernetes Prow Robot
fd5c406112 Merge pull request #120933 from mengjiao-liu/contextual-logging-scheduler-remaining-part
kube-scheduler: convert the remaining part to use contextual logging
2023-10-27 10:30:58 +02:00
Kubernetes Prow Robot
53e91942bf Merge pull request #117780 from sourcelliu/schedulinggates
Improve the performance of schedulinggates
2023-10-26 18:14:32 +02:00
Kubernetes Prow Robot
2749509f35 Merge pull request #121469 from sanposhiho/renamerename
cleanup: rename failedPlugin to plugin in framework.Status
2023-10-25 20:16:43 +02:00
Kensei Nakada
27bb66fd7b cleanup: rename failedPlugin to plugin in framework.Status 2023-10-25 12:03:56 +00:00
Kubernetes Prow Robot
9aa04752e7 Merge pull request #118463 from testwill/replace_loop
chore: slice replace loop
2023-10-24 15:04:39 +02:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kubernetes Prow Robot
6d7d249372 Merge pull request #121077 from chrishenzie/readwriteoncepod-ga
Graduate ReadWriteOncePod to GA
2023-10-24 05:26:05 +02:00
Kubernetes Prow Robot
5a4e792e06 Merge pull request #120534 from pohly/dra-scheduler-ssa-as-fallback
dra scheduler: fall back to SSA for PodSchedulingContext updates
2023-10-23 21:06:58 +02:00
Kensei Nakada
fb6b10997a cleanup: remove useless test 2023-10-22 04:41:59 +00:00
Chris Henzie
2dbd405583 Graduate ReadWriteOncePod to GA 2023-10-20 10:40:39 -07:00
Kubernetes Prow Robot
8d4ccd67e3 Merge pull request #119517 from sanposhiho/block-status
feature(scheduler): simplify QueueingHintFn by introducing new statuses
2023-10-20 19:24:31 +02:00
Kensei Nakada
4f5bc7e8d7 fix based on reviews 2023-10-20 02:53:06 +00:00
Michal Wozniak
32fdb55192 Use Patch instead of SSA for Pod Disruption condition 2023-10-19 21:00:19 +02:00
Kensei Nakada
cb5dc46edf feature(scheduler): simplify QueueingHint by introducing new statuses 2023-10-19 11:02:11 +00:00
Kubernetes Prow Robot
ff4ba92b1c Merge pull request #119155 from carlory/fix-118893-1
nodeaffinity: scheduler queueing hints
2023-10-19 09:31:27 +02:00
Kubernetes Prow Robot
78b34aa8fc Merge pull request #116938 from olderTaoist/fix-image-locality
fix ImageLocality plugin score is inconsistent
2023-10-19 06:15:26 +02:00
olderTaoist
5d5958e338 fix ImageLocality plugin score is inconsistent 2023-10-17 09:38:03 +08:00
Kubernetes Prow Robot
46c307868f Merge pull request #119176 from carlory/fix-118893-2
nodeports: scheduler queueing hints
2023-10-10 19:07:07 +02:00
Kubernetes Prow Robot
e224fc75ca Merge pull request #116885 from mengjiao-liu/contextual-logging-scheduler-plugin-examples
Migrated  `pkg/scheduler/framework/plugins/examples/` to use contextual logging
2023-10-09 20:32:46 +02:00
Mengjiao Liu
9cca527c4b Migrated pkg/scheduler/framework/plugins/examples/ to use contextual logging 2023-10-09 11:43:17 +08:00
carlory
7cba35f651 nodeports: scheduler queueing hints
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-10-08 11:34:43 +08:00
bzsuni
6200eb04af use generic sets in scheduler
Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>
2023-09-28 21:31:33 +08:00
Kubernetes Prow Robot
9c5698f514 Merge pull request #116803 from mengjiao-liu/contextual-logging-scheduler-plugin-volumebinding
Migrated `pkg/scheduler/framework/plugins/volumebinding` to contextual logging
2023-09-27 15:04:38 -07:00
carlory
1d88bf9789 scheduler/nodeaffinity: reduce pod scheduling latency
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
2023-09-25 09:41:22 +08:00
Kubernetes Prow Robot
3ac83f528d Merge pull request #119290 from carlory/add-logger
the scheduling queue logs the error and treats it as QueueAfterBackoff
2023-09-22 08:10:49 -07:00
Mengjiao Liu
3eb6c4d368 Migrated pkg/scheduler/framework/plugins/volumebinding to contextual logging 2023-09-21 11:28:12 +08:00
carlory
0105a002bc when the hint fn returns error, the scheduling queue logs the error and treats it as QueueAfterBackoff.
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

Co-authored-by: Kante Yin <kerthcet@gmail.com>

Co-authored-by: XsWack <xushiwei5@huawei.com>
2023-09-21 09:40:44 +08:00
Mengjiao Liu
a7466f44e0 Change the scheduler plugins PluginFactory function to use context parameter to pass logger
- Migrated pkg/scheduler/framework/plugins/nodevolumelimits to use contextual logging
- Fix golangci-lint validation failed
- Check for plugins creation err
2023-09-20 17:49:54 +08:00
charles-chenzz
c8b9d64d81 scheduler test: unify util to fake pod. 2023-09-18 20:05:01 +08:00
Patrick Ohly
7cac1dcf67 dra scheduler: fall back to SSA for PodSchedulingContext updates
During scheduler_perf testing, roughly 10% of the PodSchedulingContext update
operations failed with a conflict error. Using SSA would avoid that, but
performance measurements showed that this causes a considerable
slowdown (primarily because of the slower encoding with JSON instead of
protobuf, but also because server-side processing is more expensive).

Therefore a normal update is tried first and SSA only gets used when there has
been a conflict. Using SSA in that case instead of giving up outright is better
because it avoids another scheduling attempt.
2023-09-15 15:05:38 +02:00
Stephen Kitt
3cb0b520d6 Scheduler CSI tests: switch maxVols to int32
This ends up stored in an int32 Count, use the target type throughout
to avoid narrowing conversions.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-15 09:52:50 +02:00
wackxu
28dbe8a34d scheduler/NodeUnschedulable: reduce pod scheduling latency
Signed-off-by: wackxu <xushiwei5@huawei.com>
2023-09-14 10:23:43 +08:00
Stephen Kitt
9990307146 kube-scheduler: drop deprecated pointer package
This replaces deprecated k8s.io/utils/pointer functions with their ptr
equivalent.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-09-13 09:42:19 +02:00
Kubernetes Prow Robot
db49b13ccd Merge pull request #120252 from kerthcet/cleanup/framework-import
Move framework testing libraries to the right place
2023-09-12 17:44:11 -07:00
kerthcet
6fbb8ec7e4 Move scheduler testing utils to /scheduler/testing
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-09-12 13:42:38 +08:00
Patrick Ohly
6f9140e421 DRA scheduler: stop allocating before deallocation
This fixes a test flake:

    [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552

      [FAILED] number of deallocations
      Expected
          <int64>: 2
      to equal
          <int64>: 1
      In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652

This can be reproduced locally with

    stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.*reallocation.works  -ginkgo.no-color -v=4 -ginkgo.v

Log output showed that the sequence of events leading to this was:
- claim gets allocated because of selected node
- a different node has to be used, so PostFilter sets
  claim.status.deallocationRequested
- the driver deallocates
- before the scheduler can react and select a different node,
  the driver allocates *again* for the original node
- the scheduler asks for deallocation again
- the driver deallocates again (causing the test failure)
- eventually the pod runs

The fix is to disable allocations first by removing the selected node and then
starting to deallocate.
2023-09-11 10:56:17 +02:00
Kubernetes Prow Robot
a64a3e16ec Merge pull request #120253 from pohly/dra-scheduler-podschedulingcontext-updates
dra scheduler: refactor PodSchedulingContext updates
2023-09-08 02:48:14 -07:00
Patrick Ohly
5c7dac2d77 dra scheduler: refactor PodSchedulingContext updates
Instead of modifying the PodSchedulingContext and then creating or updating it,
now the required changes (selected node, potential nodes) are tracked and the
actual input for an API call is created if (and only if) needed at the end.

This makes the code easier to read and change. In particular, replacing the
Update call with Patch or Apply is easy.
2023-09-08 08:06:06 +02:00
Kubernetes Prow Robot
2d5b6f16f5 Merge pull request #120213 from pohly/dra-scheduler-resourceclass-missing
dra: resourceclass missing
2023-09-06 23:47:09 -07:00
Patrick Ohly
c682d2b8c5 scheduler: add ResourceClass events
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot
cd91351dff Merge pull request #117720 from kerthcet/feat/remove-selector-spread
Remove deprecated selectorSpread
2023-08-29 00:25:22 -07:00
Kubernetes Prow Robot
3e910875a7 Merge pull request #120125 from kerthcet/cleanup/write-to-cycle
Make sure skipped score plugins always returned
2023-08-28 15:13:20 -07:00