kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	46c307868f	Merge pull request #119176 from carlory/fix-118893-2 nodeports: scheduler queueing hints	2023-10-10 19:07:07 +02:00
Kubernetes Prow Robot	e224fc75ca	Merge pull request #116885 from mengjiao-liu/contextual-logging-scheduler-plugin-examples Migrated `pkg/scheduler/framework/plugins/examples/` to use contextual logging	2023-10-09 20:32:46 +02:00
Mengjiao Liu	9cca527c4b	Migrated `pkg/scheduler/framework/plugins/examples/` to use contextual logging	2023-10-09 11:43:17 +08:00
carlory	7cba35f651	nodeports: scheduler queueing hints Co-authored-by: Kensei Nakada <handbomusic@gmail.com> Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>	2023-10-08 11:34:43 +08:00
bzsuni	6200eb04af	use generic sets in scheduler Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>	2023-09-28 21:31:33 +08:00
Kubernetes Prow Robot	9c5698f514	Merge pull request #116803 from mengjiao-liu/contextual-logging-scheduler-plugin-volumebinding Migrated `pkg/scheduler/framework/plugins/volumebinding` to contextual logging	2023-09-27 15:04:38 -07:00
bzsuni	b71d7f9305	use generic Set in scheduler Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>	2023-09-24 22:46:55 +08:00
Kubernetes Prow Robot	3ac83f528d	Merge pull request #119290 from carlory/add-logger the scheduling queue logs the error and treats it as QueueAfterBackoff	2023-09-22 08:10:49 -07:00
Mengjiao Liu	3eb6c4d368	Migrated `pkg/scheduler/framework/plugins/volumebinding` to contextual logging	2023-09-21 11:28:12 +08:00
carlory	0105a002bc	when the hint fn returns error, the scheduling queue logs the error and treats it as QueueAfterBackoff. Co-authored-by: Kensei Nakada <handbomusic@gmail.com> Co-authored-by: Kante Yin <kerthcet@gmail.com> Co-authored-by: XsWack <xushiwei5@huawei.com>	2023-09-21 09:40:44 +08:00
Mengjiao Liu	a7466f44e0	Change the scheduler plugins PluginFactory function to use context parameter to pass logger - Migrated pkg/scheduler/framework/plugins/nodevolumelimits to use contextual logging - Fix golangci-lint validation failed - Check for plugins creation err	2023-09-20 17:49:54 +08:00
Kubernetes Prow Robot	76a22d3b32	Merge pull request #120711 from charles-chenzz/unify_fake_pod_scheduler scheduler test: unify util to fake pod	2023-09-18 09:26:31 -07:00
charles-chenzz	c8b9d64d81	scheduler test: unify util to fake pod.	2023-09-18 20:05:01 +08:00
Kubernetes Prow Robot	3cfdf3c33d	Merge pull request #120434 from pohly/scheduler-backoff-metric-test scheduler: fix TestIncomingPodsMetrics unit test	2023-09-18 03:00:31 -07:00
Stephen Kitt	3cb0b520d6	Scheduler CSI tests: switch maxVols to int32 This ends up stored in an int32 Count, use the target type throughout to avoid narrowing conversions. Signed-off-by: Stephen Kitt <skitt@redhat.com>	2023-09-15 09:52:50 +02:00
wackxu	28dbe8a34d	scheduler/NodeUnschedulable: reduce pod scheduling latency Signed-off-by: wackxu <xushiwei5@huawei.com>	2023-09-14 10:23:43 +08:00
Stephen Kitt	9990307146	kube-scheduler: drop deprecated pointer package This replaces deprecated k8s.io/utils/pointer functions with their ptr equivalent. Signed-off-by: Stephen Kitt <skitt@redhat.com>	2023-09-13 09:42:19 +02:00
Kubernetes Prow Robot	db49b13ccd	Merge pull request #120252 from kerthcet/cleanup/framework-import Move framework testing libraries to the right place	2023-09-12 17:44:11 -07:00
Patrick Ohly	819eddaf9a	scheduler: fix TestIncomingPodsMetrics unit test addUnschedulablePodBackToBackoffQ happened to put the pod into the backoff queue because - the pod was not popped earlier and thus not in flight - the PodInfo had UnschedulablePlugins set - determineSchedulingHintForInFlightPod has code for "if UnschedulablePlugins is set and pod not in flight -> internal error, use backoff" Relying on such special code is not good. A better way to force backoff is by recording some concurrent event. isPodWorthRequeuing then calls the queueHintReturnQueueAfterBackoff function and the pod goes to the backoff queue.	2023-09-12 08:38:53 +02:00
kerthcet	6fbb8ec7e4	Move scheduler testing utils to /scheduler/testing Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-09-12 13:42:38 +08:00
Kensei Nakada	0d3eafdfa3	fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105 ) * always put Pods with no unschedulable plugins into activeQ/backoffQ * address review comments	2023-09-11 09:30:11 -07:00
Patrick Ohly	6f9140e421	DRA scheduler: stop allocating before deallocation This fixes a test flake: [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552 [FAILED] number of deallocations Expected <int64>: 2 to equal <int64>: 1 In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652 This can be reproduced locally with stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.reallocation.works -ginkgo.no-color -v=4 -ginkgo.v Log output showed that the sequence of events leading to this was: - claim gets allocated because of selected node - a different node has to be used, so PostFilter sets claim.status.deallocationRequested - the driver deallocates - before the scheduler can react and select a different node, the driver allocates again* for the original node - the scheduler asks for deallocation again - the driver deallocates again (causing the test failure) - eventually the pod runs The fix is to disable allocations first by removing the selected node and then starting to deallocate.	2023-09-11 10:56:17 +02:00
Kubernetes Prow Robot	41689233b4	Merge pull request #120334 from pohly/scheduler-clear-unschedulable-plugins scheduler: avoid false "unschedulable" pod state	2023-09-08 12:01:23 -07:00
Patrick Ohly	4e73634b53	scheduler: start scheduling attempt with clean UnschedulablePlugins When some plugin was registered as "unschedulable" in some previous scheduling attempt, it kept that attribute for a pod forever. When that plugin then later failed with an error that requires backoff, the pod was incorrectly moved to the "unschedulable" queue where it got stuck until the periodic flushing because there was no event that the plugin was waiting for. Here's an example where that happened: framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c" schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c" ... scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip" Pop still needs the information about unschedulable plugins to update the UnschedulableReason metric. It can reset that information before returning the PodInfo for the next scheduling attempt.	2023-09-08 16:52:36 +02:00
Kubernetes Prow Robot	a64a3e16ec	Merge pull request #120253 from pohly/dra-scheduler-podschedulingcontext-updates dra scheduler: refactor PodSchedulingContext updates	2023-09-08 02:48:14 -07:00
Patrick Ohly	5c7dac2d77	dra scheduler: refactor PodSchedulingContext updates Instead of modifying the PodSchedulingContext and then creating or updating it, now the required changes (selected node, potential nodes) are tracked and the actual input for an API call is created if (and only if) needed at the end. This makes the code easier to read and change. In particular, replacing the Update call with Patch or Apply is easy.	2023-09-08 08:06:06 +02:00
Kubernetes Prow Robot	2d5b6f16f5	Merge pull request #120213 from pohly/dra-scheduler-resourceclass-missing dra: resourceclass missing	2023-09-06 23:47:09 -07:00
Patrick Ohly	c682d2b8c5	scheduler: add ResourceClass events When filtering fails because a ResourceClass is missing, we can treat the pod as "unschedulable" as long as we then also register a cluster event that wakes up the pod. This is more efficient than periodically retrying.	2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot	a7f9e70384	Merge pull request #120413 from pohly/scheduler-in-flight-events-fix scheduler: fix tracking of concurrent events	2023-09-05 15:17:03 -07:00
Patrick Ohly	c131c92b9f	scheduler: unit test case for concurrent event with other pod The problematic scenario was having one pod in flight, one event in the list, and then detecting a concurrent event for a second pod after the first pod is done. The new test case covers that. To make it work without assumptions about the implementation, the QueuedPodInfo returned by Pop must be the one passed to AddUnschedulableIfNotPresent after (potentially) populating UnschedulablePlugins. This is done via callback functions which bind to the same shared variable.	2023-09-05 21:01:13 +02:00
Patrick Ohly	cd943dd95e	scheduler: fix tracking of concurrent events The previous approach was based on the assumption that an in-flight pod can use the head of the received event list as marker for identifying all events that occur while the pod is in flight. That assumption is incorrect: when that existing element gets removed from the list because all pods that were in-flight when it was received are done, that marker's Next method returns nil and the code which should have seen several concurrent events (if there were any) missed all of those. As a result, a pod with concurrent events could incorrectly get moved to the unschedulable queue where it could got stuck until the next periodic purging after 5 minutes if there was no other event for it. The approach with maintaining a single list of concurrent events can be fixed by inserting each in-flight pod into the list and using that element to identify "more recent" events for the pod.	2023-09-05 19:58:38 +02:00
SataQiu	cae090e7fe	scheduler: remove unused constant SchedulerPolicyConfigMapKey	2023-09-04 17:48:36 +08:00
Kubernetes Prow Robot	cd91351dff	Merge pull request #117720 from kerthcet/feat/remove-selector-spread Remove deprecated selectorSpread	2023-08-29 00:25:22 -07:00
Kubernetes Prow Robot	3e910875a7	Merge pull request #120125 from kerthcet/cleanup/write-to-cycle Make sure skipped score plugins always returned	2023-08-28 15:13:20 -07:00
Patrick Ohly	5269e76990	scheduler: properly skip DRA events Because of a misplaced `append` (should have been inside if clause, not after it), some handler from a previous loop iteration was added again. This was harmless because the resulting slice was only used for waiting for cache sync, but should better get fixed anyway.	2023-08-28 17:55:44 +02:00
Kubernetes Prow Robot	029d518970	Merge pull request #117588 from kerthcet/cleanup/use-genericset Avoid duplicated dots in pod status when preempting	2023-08-28 08:39:44 -07:00
kerthcet	580f83ab4a	Avoid duplicated dots in pod condition Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-08-28 22:36:36 +08:00
kerthcet	855b445d28	Remove deprecated selectorSpread Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-08-28 22:11:33 +08:00
Kubernetes Prow Robot	faf1b5d655	Merge pull request #114685 from AxeZhan/dynamicresources dynamic resource allocation: optimize class.SuitableNodes usage	2023-08-28 04:43:43 -07:00
kerthcet	3d583398fe	Avoid to build the error msg for twice Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-08-28 17:13:39 +08:00
Kubernetes Prow Robot	10c622e99a	Merge pull request #119994 from SataQiu/remove-scheduler-v1beta3 scheduler: remove deprecated v1beta3 KubeSchedulerConfiguration component config	2023-08-24 15:31:17 -07:00
Kubernetes Prow Robot	b910deb3a1	Merge pull request #120000 from kerthcet/cleanup/no-duplication Remove duplicate codes in framework RemovePod	2023-08-24 04:22:20 -07:00
kerthcet	ab01848134	Make sure skip score plugins alwarys returned Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-08-24 13:39:47 +08:00
kerthcet	9ee94b0204	Remove duplicate codes in framework RemovePod Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-08-23 18:23:41 +08:00
Kubernetes Prow Robot	f852d7fead	Merge pull request #118653 from pohly/volume-resource-requirements Volume resource requirements	2023-08-21 14:08:05 -07:00
Kubernetes Prow Robot	f082fab916	Merge pull request #119556 from linxiulei/schedMF Trim managedFields in pod informer	2023-08-21 07:03:34 -07:00
Patrick Ohly	2472291790	api: introduce separate VolumeResourceRequirements struct PVC and containers shared the same ResourceRequirements struct to define their API. When resource claims were added, that struct got extended, which accidentally also changed the PVC API. To avoid such a mistake from happening again, PVC now uses its own VolumeResourceRequirements struct. The `Claims` field gets removed because risk of breaking someone is low: theoretically, YAML files which have a claims field for volumes now get rejected when validating against the OpenAPI. Such files have never made sense and should be fixed. Code that uses the struct definitions needs to be updated.	2023-08-21 15:31:28 +02:00
Kubernetes Prow Robot	ea3318cb71	Merge pull request #119971 from kwakubiney/chore/include-pod-uid-in-event-log chore: attach pod UID to event log	2023-08-21 04:13:22 -07:00
Eric Lin	f93bd699aa	Trim managedFields in pod informer Signed-off-by: Eric Lin <exlin@google.com>	2023-08-20 13:09:15 +00:00
Kubernetes Prow Robot	312dc127a9	Merge pull request #118923 from AxeZhan/volume_zone_csi [Scheduler]Translate beta label to ga in volume_zone	2023-08-17 20:20:28 -07:00

1 2 3 4 5 ...

3143 Commits