Commit Graph

3343 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b2a8ac15a0 Merge pull request #124221 from arturhoo/fix-spelling-scheduler-metrics
scheduler: fix typo in metric pod_scheduling_sli_duration_seconds help
2024-04-18 10:46:19 -07:00
Kubernetes Prow Robot
846e282d05 Merge pull request #124055 from yangjunmyfm192085/optklogprint
Optimize klog output(Use klog.KObj(pod) instead of pod)
2024-04-18 02:11:47 -07:00
Kubernetes Prow Robot
d2ce87eb94 Merge pull request #123938 from pohly/dra-structured-parameters-tests
DRA: test for structured parameters
2024-04-18 02:10:08 -07:00
Kubernetes Prow Robot
2c6d5fae7a Merge pull request #122471 from nayihz/feat_podaffinity_qhint
interpodaffinity: scheduler queueing hints
2024-04-18 00:00:21 -07:00
Kubernetes Prow Robot
56b39eab7c Merge pull request #119436 from claudiubelu/unittests-9
unittests: Fixes unit tests for Windows (part 9)
2024-04-17 22:51:32 -07:00
nayihz
1b3d10aafa fix: node added with matched pod anti-affinity topologyKey
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
2024-04-12 11:08:44 +08:00
Artur Rodrigues
645b25ec67 scheduler: fix typo in metric pod_scheduling_sli_duration_seconds help 2024-04-07 16:15:06 +01:00
Patrick Ohly
6f5696b537 dra scheduler: simplify unit tests
The guideline in
https://github.com/kubernetes/community/blob/master/sig-scheduling/CONTRIBUTING.md#technical-and-style-guidelines
is to not compare error strings. This makes the tests less precise. In return,
unit tests don't need to be updated when error strings change.
2024-03-27 10:27:01 +01:00
Claudiu Belu
c2dfcf1e34 unittests: Fixes unit tests for Windows (part 9)
Currently, there are some unit tests that are failing on
Windows due to various reasons:

- time.Now() is not as precise on Windows, which means that
  2 consecutive calls may return the same timestamp.
- Different "File not found" error messages on Windows.
- The default Container Runtime URL scheme on Windows is npipe, not unix.
2024-03-26 13:42:50 +00:00
杨军10092085
ba76a624f9 Optimize klog output 2024-03-26 18:53:29 +08:00
Patrick Ohly
458e227de0 dra scheduler: unit tests
Coverage was checked with a cover profile. The biggest remaining gap is for
isSchedulableAfterClaimParametersChange and
isSchedulableAfterClassParametersChange which will get handled when refactoring
the
foreachPodResourceClaim (https://github.com/kubernetes/kubernetes/issues/123697).
2024-03-22 10:03:22 +01:00
Patrick Ohly
607261e4c5 dra scheduler: spelling fix 2024-03-22 10:03:22 +01:00
Patrick Ohly
95136db063 dra scheduler: fix re-allocation of claim with structured parameters
The code was incorrectly checking for a controller, but only the boolean
is set for allocated claims. As a result, deallocation was requested from
a non-existent control plane controller.

While at it, let's also clear the driver name. It's not needed when the
claim is deallocated.
2024-03-22 10:03:22 +01:00
nayihz
0cfe4438e9 interpodaffinity: scheduler queueing hints 2024-03-20 21:44:24 +08:00
kerthcet
84750fe52e Revert "enhancement(scheduler): share waitingPods among profiles"
This reverts commit 227c1915db.
2024-03-19 22:52:59 +01:00
kerthcet
a67d1dc010 Revert "Fix flaky test on multi profiles waiting pod"
This reverts commit 5b072a59a2.
2024-03-19 22:52:07 +01:00
Kubernetes Prow Robot
aa73f3163a Merge pull request #122292 from sanposhiho/nodeupdate
register Node/UpdateTaint event to plugins which has Node/Add only and doesn't have Node/UpdateTaint
2024-03-18 08:33:54 -07:00
Kensei Nakada
2b56de43e5 register Node/UpdateNodeTaint event to plugins which has Node/Add only, doesn't have Node/UpdateNodeTaint 2024-03-16 14:13:06 +00:00
Kevin Klues
21a0dd1d70 dra scheduler: create default claim/class parameters instead of nil
Without this, the scheduler was crashing in newClaimController() in
pkg/scheduler/framework/plugins/dynamicresources/structuredparameters.go

The code in newClaimController() assumes that the parameters are not nil.
Furthermore it assumes that there is at least one DriverRequest populated in
order to allocate any resources to a claim.

This PR adds logic to define default claim/class parameters that will allow
allocation to proceed even if an end user doesn't provide any class or claim
parameters themselves.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-11 13:57:16 +00:00
Patrick Ohly
251b3859b0 dra scheduler: consider in-flight allocation for resource calculation
Storing a modified claim with allocation and the original resource version in
the assume cache was not reliable: if an update was received, it replaced the
modified claim and the resource that was reserved for the claim might have been
used for some other claim.

To fix this, the in-flight claims are now stored in the map instead of just a
boolean and the status stored there overrides whatever is in the assume cache.

Logging got extended to diagnose this problem better. It started to occur in
E2E tests after splitting the claim update so that first the finalizer is set
and then the status, because setting the finalizer triggered an update.
2024-03-07 22:26:16 +01:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
d4d5ade7f5 dra: add "named resources" structured parameter model
Like the current device plugin interface, a DRA driver using this model
announces a list of resource instances. In contrast to device plugins, this
list is made available to the scheduler together with attributes that can be
used to select suitable instances when they are not all alike.

Because this is the first structured parameter model, some checks that
previously were not possible, in particular "is one structured parameter field
set", now gets enabled. Adding another structured parameter model will be
similar.

The applyconfigs code generator assumes that all types in an API are defined in
a single package. If it wasn't for that, it would be possible to place the
"named resources" types in separate packages, which makes their names in the Go
code more natural and provides an indication of their stability level because
the package name could include a version.
2024-03-07 22:21:16 +01:00
Patrick Ohly
096e948905 dra scheduler: support structured parameters
When a claim uses structured parameters, as indicated by the resource class
flag, the scheduler is responsible for allocating it. To do this it needs to
gather information about available node resources by watching
NodeResourceSlices and then match the in-tree claim parameters against those
resources.
2024-03-07 22:21:04 +01:00
Patrick Ohly
eb1470d60d scheduler: fix assume cache with no index
The assume cache in the volumbinding plugin can be created with no separate
index, but List then failed because it tried to use the empty index name
instead of using the store's List function.
2024-03-07 16:09:44 +01:00
Kubernetes Prow Robot
bc00c9eef0 Merge pull request #123366 from kerthcet/feat/support-initcontainer
Consider initContainer images in pod scheduling
2024-03-05 08:24:30 -08:00
Kubernetes Prow Robot
13f40e9759 Merge pull request #123686 from kerthcet/fix/flaky-test-on-multi-profile
[Scheduler] Fix flaky test on multi profiles waitingPods
2024-03-05 04:41:09 -08:00
kerthcet
5b072a59a2 Fix flaky test on multi profiles waiting pod
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-03-05 14:54:33 +08:00
Kubernetes Prow Robot
6929a11f69 Merge pull request #123481 from sanposhiho/mindomain-stable
graduate MinDomainsInPodTopologySpread to stable
2024-03-04 17:18:53 -08:00
Kubernetes Prow Robot
6c8dc1d1ed Merge pull request #123609 from veshij/fix
[kubernetes/scheduler] use lockless diagnosis collection in findNodes…
2024-03-04 11:23:50 -08:00
Kubernetes Prow Robot
e4a14fe0f5 Merge pull request #123575 from Huang-Wei/pod-scheduling-readiness-stable
Graduate PodSchedulingReadiness to stable
2024-03-03 22:29:38 -08:00
Tim Hockin
467d5d745c Get rid of unused API type NodeResources 2024-03-01 15:13:50 -08:00
Oleg Guba
ba525460e0 change result size to numAllNodes 2024-03-01 02:06:17 -08:00
Oleg Guba
e6dd36759f [kubernetes/scheduler] use lockless diagnosis collection in findNodesThatPassFilters 2024-02-29 20:43:50 -08:00
Wei Huang
01db4ae9e7 Graduate PodSchedulingReadiness to stable 2024-02-28 23:18:44 -08:00
Kensei Nakada
58a826a59a graduate MinDomainsInPodTopologySpread to stable 2024-02-28 10:42:29 +00:00
Aleksandra Malinowska
dd1e617ba0 Scheduler first fit (#123384)
* Don't evaluate extra nodes if there's no score plugin defined

* Fix existing unit test (add no op scoring plugin)

* Add unit tests for no score plugin scenario

* address review comments

* add a test with non-filter, non-scoring extender
2024-02-26 11:07:19 -08:00
Kubernetes Prow Robot
aed1f50965 Merge pull request #122629 from sanposhiho/ignore-non-
fix(scheduling queue): ignore events that interest no registered plugin
2024-02-25 10:03:21 -08:00
Kensei Nakada
18ba3b388e fix(scheduling queue): ignore events that interest no registered plugin 2024-02-24 06:42:19 +00:00
Kubernetes Prow Robot
2016fab308 Merge pull request #123382 from kerthcet/cleanup/add-testcase-for-defaults
Add testcase covering unknown plugin config in scheduler
2024-02-19 21:04:24 -08:00
kerthcet
3c9c141d98 exchange the order of comparators
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-19 20:46:36 +08:00
kerthcet
7b108d8ee1 Add testcase covering unknown plugin config
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-19 20:45:48 +08:00
kerthcet
65faa9c680 Consider initContainer images in pod scheduling
Co-authored-by:     xiaomudk <xiaomudk@gmail.com>
Co-authored-by:     kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-19 14:17:57 +08:00
kerthcet
b3ba6bda2b Add missed clusterEvents to UnrollWildCardResource
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-19 11:55:50 +08:00
AxeZhan
630ff96f9d Revert "Scheduler first fit" 2024-02-14 20:43:59 +08:00
Kubernetes Prow Robot
ad19beaa83 Merge pull request #123117 from kerthcet/fix/wild-resource
Fix registered wildcard clusterEvents doesn't work in scheduler requeueing
2024-02-09 10:34:15 -08:00
Kubernetes Prow Robot
e566bd7769 Merge pull request #121952 from sanposhiho/optimize-csi
add(nodevolumelimits): return UnschedulableAndUnresolvable when PVC is not found
2024-02-06 07:16:28 -08:00
kerthcet
f97dec2840 Add comments about wildcard clusterEvent
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-05 11:46:59 +08:00
kerthcet
d81023db30 When matching clusterEvent, we should consider the "*" additionally
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-04 14:59:26 +08:00
Toru Komatsu
3a4c35cc89 Comment on QHint for CSILimit when CSINodes are added (#122758)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-02-02 22:16:20 -08:00
Kubernetes Prow Robot
278ea691e0 Merge pull request #122946 from NoicFank/enhance-sheduler-waiting-pods
enhancement(scheduler): share waitingPods among profiles
2024-02-02 02:11:32 -08:00