Commit Graph

119 Commits

Author SHA1 Message Date
Kensei Nakada
8392f7fbb0 remove unused NextPod() 2024-06-11 07:22:09 +00:00
kerthcet
f5b6f79410 Avoid to use deprecated wait.Poll in scheduler tests
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-10 14:17:09 +08:00
Dr. Stefan Schimanski
e37917fea7
pkg/controlplane: split up config into generic controlplane and kube-related part
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-04-26 14:14:06 +02:00
Kensei Nakada
2b56de43e5 register Node/UpdateNodeTaint event to plugins which has Node/Add only, doesn't have Node/UpdateNodeTaint 2024-03-16 14:13:06 +00:00
Patrick Ohly
1d653e6185 test: use cancelation from ktesting
The return type of ktesting.NewTestContext is now a TContext. Code
which combined it WithCancel often didn't compile anymore (cannot overwrite
ktesting.TContext with context.Context). This is a good thing because all of
that code can be simplified to let ktesting handle the cancelation.
2024-03-01 07:51:22 +01:00
Mengjiao Liu
b584b87a94 kube-controller-manager: readjust log verbosity
- Increase the global level for broadcaster's logging to 3 so that users can ignore event messages by lowering the logging level. It reduces information noise.
- Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value.
- test: use cancellation from ktesting
- golangci-hints: checked error return value
2024-02-26 14:51:56 +08:00
Patrick Ohly
c46ae1b26a scheduler_perf: use ktesting.TContext + staging StartTestServer
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.

An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.

Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
2024-02-11 10:51:38 +01:00
Kensei Nakada
5310abe14a make scheduler_perf usable from other repositories 2023-12-01 12:43:08 +00:00
kerthcet
50f092c136 Add kubernetes.io/hostname to faked nodes in tests
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-11-03 11:47:36 +08:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kubernetes Prow Robot
5a4e792e06
Merge pull request #120534 from pohly/dra-scheduler-ssa-as-fallback
dra scheduler: fall back to SSA for PodSchedulingContext updates
2023-10-23 21:06:58 +02:00
Kevin Hannon
1a41ed394d convert pointer to ptr for sig-apps integration tests 2023-10-19 10:35:38 -04:00
Patrick Ohly
7cac1dcf67 dra scheduler: fall back to SSA for PodSchedulingContext updates
During scheduler_perf testing, roughly 10% of the PodSchedulingContext update
operations failed with a conflict error. Using SSA would avoid that, but
performance measurements showed that this causes a considerable
slowdown (primarily because of the slower encoding with JSON instead of
protobuf, but also because server-side processing is more expensive).

Therefore a normal update is tried first and SSA only gets used when there has
been a conflict. Using SSA in that case instead of giving up outright is better
because it avoids another scheduling attempt.
2023-09-15 15:05:38 +02:00
Kubernetes Prow Robot
10c622e99a
Merge pull request #119994 from SataQiu/remove-scheduler-v1beta3
scheduler: remove deprecated v1beta3 KubeSchedulerConfiguration component config
2023-08-24 15:31:17 -07:00
SataQiu
ef7d404702 using wait.PollUntilContextTimeout instead of deprecated wait.Poll for pkg/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler

using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling

using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
2023-08-17 17:25:09 +08:00
SataQiu
427b703c37 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration 2023-08-17 13:27:21 +08:00
Kensei Nakada
c7e7eee554
feature(scheduling_queue): track events per Pods (#118438)
* feature(sscheduling_queue): track events per Pods

* fix typos

* record events in one slice and make each in-flight Pod to refer it

* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods

* eliminate MakeNextPodFuncs

* call Done inside the scheduling queue

* fix comment

* implement done() not to require lock in it

* fix UTs

* improve the receivedEvents implementation based on suggestions

* call DonePod when we don't call AddUnschedulableIfNotPresent

* fix UT

* use queuehint to filter out events for in-flight Pods

* fix based on suggestion from aldo

* fix based on suggestion from Wei

* rename lastEventBefore → previousEvent

* fix based on suggestion

* address comments from aldo

* fix based on the suggestion from Abdullah

* gate in-flight Pods logic by the SchedulingQueueHints feature gate
2023-07-17 15:53:07 -07:00
Kubernetes Prow Robot
bea27f82d3
Merge pull request #118209 from pohly/dra-pre-scheduled-pods
dra: pre-scheduled pods
2023-07-13 14:43:37 -07:00
Patrick Ohly
80ab8f0542 dra: handle scheduled pods in kube-controller-manager
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.

Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.

The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.

What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
2023-07-13 21:27:11 +02:00
Mengjiao Liu
19869478c1 Migrate /pkg/controller/disruption to structured and contextual logging 2023-07-12 11:30:45 +08:00
Kubernetes Prow Robot
6f9d1d38d8
Merge pull request #118817 from pohly/dra-delete-claims
DRA: improve handling of completed pods
2023-07-06 10:15:15 -07:00
Patrick Ohly
7f5a02fc7e dra resourceclaim controller: enhance logging
Adding logging to event handlers makes it more obvious why (or why not) claims
and pods need to be processed.
2023-07-05 16:10:20 +02:00
Kubernetes Prow Robot
c78204dc06
Merge pull request #118202 from pohly/scheduler-perf-unit-test
scheduler-perf: run as integration tests
2023-06-28 06:24:31 -07:00
Kubernetes Prow Robot
ddbf3575a7
Merge pull request #116729 from AxeZhan/handlers_sync
[Scheduler] Make sure handlers have synced before scheduling
2023-06-28 01:26:31 -07:00
Patrick Ohly
dfd646e0a8 scheduler_perf: fix namespace deletion
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
  so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
  deleted namespaces.
2023-06-28 09:22:25 +02:00
Patrick Ohly
2e7f37353c test/integration: avoid errors in fake PC controller during shutdown
Once the context is canceled, the controller can stop processing
events. Without this change it prints errors when the apiserver is already
down.
2023-06-28 08:14:34 +02:00
Aldo Culquicondor
a4519665fe
Skip terminal Pods with a deletion timestamp from the Daemonset sync (#118716)
* Skip terminal Pods with a deletion timestamp from the Daemonset sync

Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742

* Review comments and fix integration test

Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713

* Include deleted terminal pods in history

Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301
2023-06-27 08:56:33 -07:00
kidddddddddddddddddddddd
9c7166ff63 wait for eventhandlers to sync before run scheduler 2023-06-27 23:19:34 +08:00
Mengjiao Liu
1c05cf1d51 kube-scheduler: NewFramework function to pass the context parameter
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-05-23 10:17:34 +08:00
Kubernetes Prow Robot
8b33eaa0a7
Merge pull request #116207 from pohly/dra-scheduler-perf
scheduler_perf: dynamic resource allocation test cases
2023-05-10 10:58:59 -07:00
Patrick Ohly
034528a9f0 scheduler perf: add DynamicResourceAllocation test cases
The default scheduler configuration must be based on the v1 API where the
plugin is enabled by default. Then if (and only if) the
DynamicResourceAllocation feature gate for a test is set, the corresponding
API group also gets enabled.

The normal dynamic resource claim controller is started if needed to create
ResourceClaims from ResourceClaimTemplates.

Without the upcoming optimizations in the scheduler, scheduling with dynamic
resources is fairly slow. The new test cases take around 15 minutes wall clock
time on my desktop.
2023-05-04 13:08:06 +02:00
Kante Yin
859359ad6a Fix strict linting
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-05-04 10:25:10 +08:00
Kante Yin
a7035f5459 Pass Context to StartTestServer
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-05-04 10:25:09 +08:00
Kante Yin
2d866ec2fc Teardown only scheduler in integration tests
Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-05-04 10:09:24 +08:00
Patrick Ohly
b3e0bc8864 scheduler_perf: let the test decide which informers are needed
This will change when adding dynamic resource allocation test cases. Instead of
changing mustSetupScheduler and StartScheduler for that, let's return the
informer factory and create informers as needed in the test.
2023-04-27 15:31:40 +02:00
Kubernetes Prow Robot
b8b18ecd85
Merge pull request #114051 from chrishenzie/rwop-preemption
[scheduler] Support preemption of pods using ReadWriteOncePod PVCs
2023-02-13 11:45:30 -08:00
Patrick Ohly
a7f658e442 test/integration: fix Broadcaster leak
When starting a scheduler, the event broadcaster for it wasn't stopped.
2023-02-01 12:42:50 +01:00
Kante Yin
3d0894fabf
Fix failure(context canceled) in scheduler_perf benchmark (#114843)
* Fix failure in scheduler_perf benchmark

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Fatal when error in cleaning up nodes in scheduler perf tests

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Use derived context to better organize the codes

Signed-off-by: Kante Yin <kerthcet@gmail.com>

* Change log level to 2 in scheduler perf-test

Signed-off-by: Kante Yin <kerthcet@gmail.com>

---------

Signed-off-by: Kante Yin <kerthcet@gmail.com>
2023-01-30 16:21:00 -08:00
Chris Henzie
dbc7d8ded0 feat: support preemption for pods using ReadWriteOncePod PVCs
PVCs using the ReadWriteOncePod access mode can only be referenced by a
single pod. When a pod is scheduled that uses a ReadWriteOncePod PVC,
return "Unschedulable" if the PVC is already in-use in the cluster.

To support preemption, the "VolumeRestrictions" scheduler plugin
computes cycle state during the PreFilter phase. This cycle state
contains the number of references to the ReadWriteOncePod PVCs used by
the pod-to-be-scheduled.

During scheduler simulation (AddPod and RemovePod), we add and remove
reference counts from the cycle state if they use any of these
ReadWriteOncePod PVCs.

In the Filter phase, the scheduler checks if there are any PVC reference
conflicts, and returns "Unschedulable" if there is a conflict.

This is a required feature for the ReadWriteOncePod beta. See for more context:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode#beta
2023-01-30 10:59:22 -08:00
TommyStarK
9e885bce35 test/integration: Replace deprecated pointer function
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-01-05 18:38:40 +01:00
Wei Huang
ae5d430c76
Integration tests for KEP Pod Scheduling Readiness
- test generic integration in plugins_test.go
- test integration with SchedulingGates plugin in queue_test.go
2022-11-08 10:06:44 -08:00
Wojciech Tyczyński
71d87272de Clean shutdown of apply integration tests 2022-11-07 09:14:15 +01:00
Chris Henzie
2d0afbc054 scheduler: integration test for ReadWriteOncePod alpha
Tests scheduler enforcement of the ReadWriteOncePod PVC access mode.

- Creates a pod using a PVC with ReadWriteOncePod
- Creates a second pod using the same PVC
- Observes the second pod fails to schedule because PVC is in-use
- Deletes the first pod
- Observes the second pod successfully schedules
2022-11-01 15:08:01 -07:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Kubernetes Prow Robot
cab41bd04d
Merge pull request #111324 from wojtek-t/cleanup_testing_namespace
Cleanup no longer used Create/Delete TestingNamespace
2022-07-22 00:05:49 -07:00
Wojciech Tyczyński
aca03a4090 Cleanup no longer used Create/Delete TestingNamespace 2022-07-21 19:44:14 +02:00
Michal Wozniak
2f61b6105c Add integration tests for podgc 2022-07-20 15:17:14 +02:00
Wojciech Tyczyński
8a959396b8 Clean shutdown of volumescheduling integration tests 2022-05-28 21:14:09 +02:00
Wojciech Tyczyński
c802118e81 Update scheduler tests 2022-05-27 14:57:21 +02:00
Wojciech Tyczyński
deef9e40de Simplify Create/Delete-TestingNamespace functions 2022-05-15 23:06:26 +02:00