kubernetes

Author	SHA1	Message	Date
Kevin Klues	21a0dd1d70	dra scheduler: create default claim/class parameters instead of nil Without this, the scheduler was crashing in newClaimController() in pkg/scheduler/framework/plugins/dynamicresources/structuredparameters.go The code in newClaimController() assumes that the parameters are not nil. Furthermore it assumes that there is at least one DriverRequest populated in order to allocate any resources to a claim. This PR adds logic to define default claim/class parameters that will allow allocation to proceed even if an end user doesn't provide any class or claim parameters themselves. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2024-03-11 13:57:16 +00:00
Patrick Ohly	251b3859b0	dra scheduler: consider in-flight allocation for resource calculation Storing a modified claim with allocation and the original resource version in the assume cache was not reliable: if an update was received, it replaced the modified claim and the resource that was reserved for the claim might have been used for some other claim. To fix this, the in-flight claims are now stored in the map instead of just a boolean and the status stored there overrides whatever is in the assume cache. Logging got extended to diagnose this problem better. It started to occur in E2E tests after splitting the claim update so that first the finalizer is set and then the status, because setting the finalizer triggered an update.	2024-03-07 22:26:16 +01:00
Patrick Ohly	0b6a0d686a	dra api: rename NodeResourceSlice -> ResourceSlice While currently those objects only get published by the kubelet for node-local resources, this could change once we also support network-attached resources. Dropping the "Node" prefix enables such a future extension. The NodeName in ResourceSlice and StructuredResourceHandle then becomes optional. The kubelet still needs to provide one and it must match its own node name, otherwise it doesn't have permission to access ResourceSlice objects.	2024-03-07 22:22:55 +01:00
Patrick Ohly	d4d5ade7f5	dra: add "named resources" structured parameter model Like the current device plugin interface, a DRA driver using this model announces a list of resource instances. In contrast to device plugins, this list is made available to the scheduler together with attributes that can be used to select suitable instances when they are not all alike. Because this is the first structured parameter model, some checks that previously were not possible, in particular "is one structured parameter field set", now gets enabled. Adding another structured parameter model will be similar. The applyconfigs code generator assumes that all types in an API are defined in a single package. If it wasn't for that, it would be possible to place the "named resources" types in separate packages, which makes their names in the Go code more natural and provides an indication of their stability level because the package name could include a version.	2024-03-07 22:21:16 +01:00
Patrick Ohly	096e948905	dra scheduler: support structured parameters When a claim uses structured parameters, as indicated by the resource class flag, the scheduler is responsible for allocating it. To do this it needs to gather information about available node resources by watching NodeResourceSlices and then match the in-tree claim parameters against those resources.	2024-03-07 22:21:04 +01:00
Patrick Ohly	eb1470d60d	scheduler: fix assume cache with no index The assume cache in the volumbinding plugin can be created with no separate index, but List then failed because it tried to use the empty index name instead of using the store's List function.	2024-03-07 16:09:44 +01:00
Kubernetes Prow Robot	bc00c9eef0	Merge pull request #123366 from kerthcet/feat/support-initcontainer Consider initContainer images in pod scheduling	2024-03-05 08:24:30 -08:00
Kubernetes Prow Robot	6929a11f69	Merge pull request #123481 from sanposhiho/mindomain-stable graduate MinDomainsInPodTopologySpread to stable	2024-03-04 17:18:53 -08:00
Kubernetes Prow Robot	e4a14fe0f5	Merge pull request #123575 from Huang-Wei/pod-scheduling-readiness-stable Graduate PodSchedulingReadiness to stable	2024-03-03 22:29:38 -08:00
Tim Hockin	467d5d745c	Get rid of unused API type NodeResources	2024-03-01 15:13:50 -08:00
Wei Huang	01db4ae9e7	Graduate PodSchedulingReadiness to stable	2024-02-28 23:18:44 -08:00
Kensei Nakada	58a826a59a	graduate MinDomainsInPodTopologySpread to stable	2024-02-28 10:42:29 +00:00
Aleksandra Malinowska	dd1e617ba0	Scheduler first fit (#123384 ) * Don't evaluate extra nodes if there's no score plugin defined * Fix existing unit test (add no op scoring plugin) * Add unit tests for no score plugin scenario * address review comments * add a test with non-filter, non-scoring extender	2024-02-26 11:07:19 -08:00
kerthcet	65faa9c680	Consider initContainer images in pod scheduling Co-authored-by: xiaomudk <xiaomudk@gmail.com> Co-authored-by: kerthcet <kerthcet@gmail.com> Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-02-19 14:17:57 +08:00
kerthcet	b3ba6bda2b	Add missed clusterEvents to UnrollWildCardResource Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-02-19 11:55:50 +08:00
AxeZhan	630ff96f9d	Revert "Scheduler first fit"	2024-02-14 20:43:59 +08:00
Kubernetes Prow Robot	ad19beaa83	Merge pull request #123117 from kerthcet/fix/wild-resource Fix registered wildcard clusterEvents doesn't work in scheduler requeueing	2024-02-09 10:34:15 -08:00
Kubernetes Prow Robot	e566bd7769	Merge pull request #121952 from sanposhiho/optimize-csi add(nodevolumelimits): return UnschedulableAndUnresolvable when PVC is not found	2024-02-06 07:16:28 -08:00
kerthcet	f97dec2840	Add comments about wildcard clusterEvent Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-02-05 11:46:59 +08:00
kerthcet	d81023db30	When matching clusterEvent, we should consider the "*" additionally Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-02-04 14:59:26 +08:00
Toru Komatsu	3a4c35cc89	Comment on QHint for CSILimit when CSINodes are added (#122758 ) Signed-off-by: utam0k <k0ma@utam0k.jp>	2024-02-02 22:16:20 -08:00
Kubernetes Prow Robot	278ea691e0	Merge pull request #122946 from NoicFank/enhance-sheduler-waiting-pods enhancement(scheduler): share waitingPods among profiles	2024-02-02 02:11:32 -08:00
NoicFank	227c1915db	enhancement(scheduler): share waitingPods among profiles	2024-02-01 10:06:23 +08:00
Kubernetes Prow Robot	c606448922	Merge pull request #122996 from Huang-Wei/cleanup-dra-postfilter DRA: always returns Unschedulable in PostFilter	2024-01-27 08:19:44 -08:00
Kubernetes Prow Robot	02aaad0de9	Merge pull request #121876 from pohly/dra-reserve-during-pod-binding dra: reserve + publish during pod binding	2024-01-26 19:58:01 +01:00
Wei Huang	ceabc4aba8	DRA: always returns Unschedulable in PostFilter	2024-01-26 09:44:00 -08:00
Patrick Ohly	6cf4203751	dra scheduler: reformat code By continuing with the next item in the if clause, the else is no longer needed and indention can be reduced.	2024-01-26 10:58:03 +01:00
Patrick Ohly	a809a6353b	scheduler: publish PodSchedulingContext during PreBind Blocking API calls during a scheduling cycle like the DRA plugin is doing slow down overall scheduling, i.e. also affecting pods which don't use DRA. It is easy to move the blocking calls into a goroutine while the scheduling cycle ends with "pod unschedulable". The hard part is handling an error when those API calls then fail in the background. There is a solution for that (see https://github.com/kubernetes/kubernetes/pull/120963), but it's complex. Instead, publishing the modified PodSchedulingContext can also be done later. In the more common case of a pod which is ready for binding except for its claims, that'll be in PreBind, which runs in a separate goroutine already. In the less common case that a pod cannot be scheduled, that'll be in Unreserve which is still blocking.	2024-01-26 10:58:03 +01:00
Patrick Ohly	5d1509126f	dra: patch ReservedFor during PreBind This moves adding a pod to ReservedFor out of the main scheduling cycle into PreBind. There it is done concurrently in different goroutines. For claims which were specifically allocated for a pod (the most common case), that usually makes no difference because the claim is already reserved. It starts to matter when that pod then cannot be scheduled for other reasons, because then the claim gets unreserved to allow deallocating it. It also matters for claims that are created separately and then get used multiple times by different pods. Because multiple pods might get added to the same claim rapidly independently from each other, it makes sense to do all claim status updates via patching: then it is no longer necessary to have an up-to-date copy of the claim because the patch operation will succeed if (and only if) the patched claim is valid. Server-side-apply cannot be used for this because a client always has to send the full list of all entries that it wants to be set, i.e. it cannot add one entry unless it knows the full list.	2024-01-26 10:58:03 +01:00
Kubernetes Prow Robot	6c493a1ef9	Merge pull request #122969 from kerthcet/fix/claim [DRA] Fix indexing the error value in unavailableClaim	2024-01-25 17:34:11 +01:00
kerthcet	7801173f6e	get the error claim in dra Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-01-25 23:22:50 +08:00
kerthcet	8371e4cf93	quick break when met Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-01-23 19:40:15 +08:00
Kubernetes Prow Robot	c6887b1c00	Merge pull request #117803 from sourcelliu/preFilterState Optimize the performance of the Clone method of preFilterState	2024-01-19 10:57:20 +01:00
amewayne	71c3593f85	support nodeAnnotationsChanged event to trigger rescheduling	2024-01-10 22:38:54 +08:00
Kubernetes Prow Robot	919d4624a0	Merge pull request #122503 from sunbinnnnn/scheduler-extender-support-ignore-bind Support ignore scheduler extender error when binding	2024-01-08 17:30:44 +01:00
Kubernetes Prow Robot	5b979a3a53	Merge pull request #122498 from Gekko0114/close Allow framework plugins to be closed	2024-01-08 17:30:36 +01:00
Neil Sun	87816ffb2c	Support ignore scheduler extender error when binding Signed-off-by: sunbinnnnn <sunbinnnnn@hotmail.com>	2024-01-08 21:06:25 +08:00
Kubernetes Prow Robot	b529e6ff1c	Merge pull request #122622 from nayihz/cleanup_comment swap originalPod and modifiedPod to match the comments	2024-01-06 14:20:50 +01:00
nayihz	edff1c3b2f	swap originalPod and modifiedPod to match the comments.	2024-01-06 19:07:18 +08:00
moriya	288c00c0c7	Allow framework plugins to be closed	2024-01-06 10:11:19 +09:00
Kensei Nakada	09abd6be5a	address reviews	2024-01-02 02:10:41 +00:00
Kensei Nakada	5ab2317947	run all PreFilter when the preemption will happen later in the same scheduling cycle	2024-01-01 09:44:06 +00:00
Kubernetes Prow Robot	3be9a8cc73	Merge pull request #122351 from sanposhiho/doc-update-for-add doc: make it clear that how newly scheduled Pods are interpreted in cluster events	2023-12-31 08:04:43 +01:00
Kensei Nakada	e1e035e3a8	doc: make it clear that newly scheduled Pods are Pod/Add events	2023-12-31 05:58:12 +00:00
Kensei Nakada	bf1b3a161b	add(nodevolumelimits): return UnschedulableAndUnresolvable when PVC is not found	2023-12-30 23:00:56 +00:00
Kubernetes Prow Robot	afa3f114d6	Merge pull request #117024 from sanposhiho/nodeaffinity-pre-score-skip feature(NodeAffinity): return Skip in PreScore when nothing to do in Score	2023-12-27 16:53:51 +01:00
Kubernetes Prow Robot	e2a6ce713c	Merge pull request #122415 from pohly/dra-scheduler-deallocation-fix dra scheduler: fix incorrect tracking of claim candidates for reallocation	2023-12-25 10:45:00 +01:00
Aleksandra Malinowska	e19be41f58	Don't evaluate extra nodes if there's no score plugin defined	2023-12-21 13:29:46 +01:00
Patrick Ohly	b0d4a8cd6d	dra scheduler: fix incorrect tracking of claim candidates for reallocation When dealing with unschedulable pods, the intent was to deallocate only claims which are allocated and use delayed allocation. That if check wasn't handled correctly, causing also claims with immediate allocation to be considered as candidates. Found during code reading, probably has never occurred in practice yet.	2023-12-20 09:04:01 +01:00
Kubernetes Prow Robot	2b5c0c281d	Merge pull request #122310 from weilaaa/use_buildin_max_min_instead use build-in max and min func to instead of k8s.io/utils/integer funcs	2023-12-18 19:25:34 +01:00

1 2 3 4 5 ...

1486 Commits