Commit Graph

136 Commits

Author SHA1 Message Date
carlory
0105a002bc when the hint fn returns error, the scheduling queue logs the error and treats it as QueueAfterBackoff.
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

Co-authored-by: Kante Yin <kerthcet@gmail.com>

Co-authored-by: XsWack <xushiwei5@huawei.com>
2023-09-21 09:40:44 +08:00
Kensei Nakada
0d3eafdfa3
fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105)
* always put Pods with no unschedulable plugins into activeQ/backoffQ

* address review comments
2023-09-11 09:30:11 -07:00
Patrick Ohly
4e73634b53 scheduler: start scheduling attempt with clean UnschedulablePlugins
When some plugin was registered as "unschedulable" in some previous scheduling
attempt, it kept that attribute for a pod forever. When that plugin then later
failed with an error that requires backoff, the pod was incorrectly moved to the
"unschedulable" queue where it got stuck until the periodic flushing because
there was no event that the plugin was waiting for.

Here's an example where that happened:

     framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c"
    schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c"
    ...
    scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip"

Pop still needs the information about unschedulable plugins to update the
UnschedulableReason metric. It can reset that information before returning the
PodInfo for the next scheduling attempt.
2023-09-08 16:52:36 +02:00
Patrick Ohly
cd943dd95e scheduler: fix tracking of concurrent events
The previous approach was based on the assumption that an in-flight pod can use
the head of the received event list as marker for identifying all events that
occur while the pod is in flight. That assumption is incorrect: when that
existing element gets removed from the list because all pods that were
in-flight when it was received are done, that marker's Next method returns nil
and the code which should have seen several concurrent events (if there were
any) missed all of those.

As a result, a pod with concurrent events could incorrectly get moved to the
unschedulable queue where it could got stuck until the next periodic purging
after 5 minutes if there was no other event for it.

The approach with maintaining a single list of concurrent events can be fixed
by inserting each in-flight pod into the list and using that element to
identify "more recent" events for the pod.
2023-09-05 19:58:38 +02:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
050c0437e6 fix: broadcast when pod is pushed back to activeQ directly in AddUnschedulableIfNotPresent 2023-08-09 03:32:14 +00:00
Kensei Nakada
c7e7eee554
feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00
carlory
0599b3caa0 change the QueueingHintFn to pass a logger 2023-07-13 00:56:41 +08:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kensei Nakada
be0db3f93d clean up the implementation around QueueingHintFn 2023-07-06 16:07:39 +00:00
Heba Elayoty
d548983dbb
Use table-driven table for TestPerPodSchedulingMetrics
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-29 14:51:55 -07:00
Kubernetes Prow Robot
d9714078f8
Merge pull request #118551 from sanposhiho/event-to-register
feature(scheduler): implement ClusterEventWithHint to filter out useless events
2023-06-26 06:41:45 -07:00
Kensei Nakada
6f8d38406a feature(scheduler): implement ClusterEventWithHint to filter out useless events 2023-06-22 13:36:19 +00:00
Heba Elayoty
902c711fb4
Unset gated pod info timestamp in addToActiveQ
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
2023-06-21 14:16:08 -07:00
Kubernetes Prow Robot
4483bf66fe
Merge pull request #116635 from mengjiao-liu/contextual-logging-plugin-interpodaffinity
Migrated `pkg/scheduler/framework/plugins/interpodaffinity` to contextual logging
2023-06-09 08:14:13 -07:00
Yuan Chen
9eaa50cc82 Rename scheduler queue variables for consistency 2023-06-05 09:02:06 -07:00
Mengjiao Liu
6d23da045f Migrated pkg/scheduler/framework/plugins/interpodaffinity to use contextual logging 2023-06-01 18:24:54 +08:00
likakuli
5a14573258 clean: use info instead of error to log queue closed message when scheduler exit
Signed-off-by: likakuli <1154584512@qq.com>
2023-05-31 11:07:24 +08:00
Mengjiao Liu
074900e81b scheduler: update the scheduler interface and cache methods to use contextual logging 2023-05-29 13:26:32 +08:00
Kubernetes Prow Robot
29c8fb678c
Merge pull request #117194 from sanposhiho/revert-preenqueue
Revert "Optimization on running prePreEnqueuePlugins before adding pods into activeQ"
2023-04-13 16:00:50 -07:00
Kensei Nakada
2bed67d0f1 Revert "Optimization on running prePreEnqueuePlugins before adding pods into activeQ"
This reverts commit c01fa8279d.
2023-04-11 22:28:42 +00:00
sarab
8d18ae6fc2 Use the generic Set in scheduler 2023-04-09 11:34:17 +05:30
Kensei Nakada
6697467062 add(scheduler): implement "plugin_execution_duration_seconds" metric in PreEnqueue 2023-03-12 04:45:52 +00:00
Aldo Culquicondor
07a73bb2e1
One lock among PodNominator and SchedulingQueue
Change-Id: I17fe5da40250e42c04124c25b530ce6c8dea4154
2023-03-08 16:18:36 -05:00
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
lianghao208
c01fa8279d Optimization on running prePreEnqueuePlugins before adding pods into activeQ 2023-02-15 11:13:21 +08:00
Wei Huang
a731a44596
Fix an accuracy issue of scheduler_pending_pods metric 2022-11-21 21:33:16 -08:00
Wei Huang
0f66366aff
Fix an issue that pod may be added to backoffQ 2022-11-08 10:05:32 -08:00
Wei Huang
0b27f25252
PreEnqueue implementation
- Add PreEnqueuePlugin to Scheduler Framework
- Implement PreEnqueuePlugin in scheduler queue
- Implementation of SchedulingGates plugin
- Metrics
2022-11-07 14:02:58 -08:00
kidddddddddddddddddddddd
121d24cfc7 changes in non-test files 2022-10-12 21:09:55 +08:00
astraw99
ee24513e47 Fix scheduler misc 2022-09-04 00:07:49 +08:00
Yuan Chen
7e05c0a522 Log scheduling queue events
Fix a typo

Address comments

Log one more queue event

Update pkg/scheduler/internal/queue/scheduling_queue.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

Update pkg/scheduler/internal/queue/scheduling_queue.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

Address comments

Remove 'source' from scheudling queue events

Update scheduling queue event msg.

Update scheduling queue events
2022-08-24 16:47:14 -07:00
Wei Huang
7df9bfcfef
Expose a pending pods summary in scheudler's dummper output 2022-08-05 22:02:38 -07:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
HaoJie Liu
5639a4000f
Detailed printed error message
Signed-off-by: HaoJie Liu <liuhaojie@beyondcent.com>
2022-07-08 16:58:29 +08:00
harry1064
fceb5cd4b1
Use clock package from k8s.io/utils/clock
- Remove unwanted clock.go file.
2022-06-25 00:25:12 +08:00
zzr93
923a99db95 unify wake up variable names to activated 2022-05-08 12:09:13 +08:00
zzr93
94ed4d0761 wake up only when pod being added to activeQ 2022-05-07 11:23:06 +08:00
Aldo Culquicondor
429457e184 Fix: abort nominating a pod that was already scheduled to a node
Change-Id: Iacb8530769e7a93e3bc8384cf51d7a8fd9a192e1
2022-04-04 10:52:59 -04:00
Alex Wang
8a5df1302a rename unschedulableQ to unschedulablePods
Signed-off-by: Alex Wang <wangqingcan1990@gmail.com>
2022-03-24 17:38:49 +08:00
Alex Wang
e772202e95 set PodMaxUnschedulableQDuration as 5 min 2022-03-17 15:37:34 +08:00
Harsh Prateek
840fc3ea7b
Add gauge metric to track unschedulable pod (#108475)
* Add gauge metric to track unschedulable pod

* Add review comments
2022-03-14 13:02:58 -07:00
Abdullah Gharaibeh
8a1c70b48c Graduate PodAffinityNamespaceSelector to GA 2022-02-18 12:07:29 -05:00
Alex Wang
87549203e9 add deprecated flag for flush pods to activeq interval 2022-02-16 11:05:52 +08:00
duc
040f8a4cf0 a flag to indicate whether or not to broadcast
change the returns above to breaks, add a flag to indicate whether or not to broadcast.
2022-01-27 21:52:30 +08:00
duc
c3bfb568f9
fix flushBackoffQCompleted: remove defer
'defer' is called in the 'for' loop, remove it
2022-01-27 17:08:12 +08:00
Wei Huang
2433b083a9
clear pod's .status.nominatedNodeName when necessary 2021-12-16 10:55:13 -08:00
BinacsLee
d484b1aa3d scheduler: cleanup return value 2021-12-15 21:51:42 +08:00
BinacsLee
f277864aa5 Scheduler queue: fix calculateBackoffDuration overflow in extreme data cases 2021-09-21 09:42:52 +08:00
sanposhiho
b7dd0a7660 Clean up: delete NumUnschedulablePods because it's no longer in use 2021-08-24 17:34:05 +09:00