Commit Graph

963 Commits

Author SHA1 Message Date
Patrick Ohly
6f9140e421 DRA scheduler: stop allocating before deallocation
This fixes a test flake:

    [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552

      [FAILED] number of deallocations
      Expected
          <int64>: 2
      to equal
          <int64>: 1
      In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652

This can be reproduced locally with

    stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.*reallocation.works  -ginkgo.no-color -v=4 -ginkgo.v

Log output showed that the sequence of events leading to this was:
- claim gets allocated because of selected node
- a different node has to be used, so PostFilter sets
  claim.status.deallocationRequested
- the driver deallocates
- before the scheduler can react and select a different node,
  the driver allocates *again* for the original node
- the scheduler asks for deallocation again
- the driver deallocates again (causing the test failure)
- eventually the pod runs

The fix is to disable allocations first by removing the selected node and then
starting to deallocate.
2023-09-11 10:56:17 +02:00
Kubernetes Prow Robot
a64a3e16ec Merge pull request #120253 from pohly/dra-scheduler-podschedulingcontext-updates
dra scheduler: refactor PodSchedulingContext updates
2023-09-08 02:48:14 -07:00
Patrick Ohly
5c7dac2d77 dra scheduler: refactor PodSchedulingContext updates
Instead of modifying the PodSchedulingContext and then creating or updating it,
now the required changes (selected node, potential nodes) are tracked and the
actual input for an API call is created if (and only if) needed at the end.

This makes the code easier to read and change. In particular, replacing the
Update call with Patch or Apply is easy.
2023-09-08 08:06:06 +02:00
Kubernetes Prow Robot
2d5b6f16f5 Merge pull request #120213 from pohly/dra-scheduler-resourceclass-missing
dra: resourceclass missing
2023-09-06 23:47:09 -07:00
Patrick Ohly
c682d2b8c5 scheduler: add ResourceClass events
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot
cd91351dff Merge pull request #117720 from kerthcet/feat/remove-selector-spread
Remove deprecated selectorSpread
2023-08-29 00:25:22 -07:00
Kubernetes Prow Robot
029d518970 Merge pull request #117588 from kerthcet/cleanup/use-genericset
Avoid duplicated dots in pod status when preempting
2023-08-28 08:39:44 -07:00
kerthcet
855b445d28 Remove deprecated selectorSpread
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 22:11:33 +08:00
Kubernetes Prow Robot
faf1b5d655 Merge pull request #114685 from AxeZhan/dynamicresources
dynamic resource allocation: optimize class.SuitableNodes usage
2023-08-28 04:43:43 -07:00
kerthcet
3d583398fe Avoid to build the error msg for twice
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 17:13:39 +08:00
Patrick Ohly
2472291790 api: introduce separate VolumeResourceRequirements struct
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.

The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.

Code that uses the struct definitions needs to be updated.
2023-08-21 15:31:28 +02:00
Kubernetes Prow Robot
312dc127a9 Merge pull request #118923 from AxeZhan/volume_zone_csi
[Scheduler]Translate beta label to ga in volume_zone
2023-08-17 20:20:28 -07:00
AxeZhan
af26ebd0fa translate beta label to ga in volume_zone 2023-08-18 00:31:09 +08:00
SataQiu
ef7d404702 using wait.PollUntilContextTimeout instead of deprecated wait.Poll for pkg/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler

using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling

using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
2023-08-17 17:25:09 +08:00
AxeZhan
47fec59a31 parse node selector in prefilter 2023-08-14 16:39:46 +08:00
wackxu
a9d26ac7c7 Optimize the code of NodeUnschedulable to reduce TolerationsTolerateTaint function calls
Signed-off-by: wackxu <xushiwei5@huawei.com>
2023-07-18 21:00:05 +08:00
carlory
0599b3caa0 change the QueueingHintFn to pass a logger 2023-07-13 00:56:41 +08:00
Patrick Ohly
6f1a29520f scheduler/dra: reduce pod scheduling latency
This is a combination of two related enhancements:
- By implementing a PreEnqueue check, the initial pod scheduling
  attempt for a pod with a claim template gets avoided when the claim
  does not exist yet.
- By implementing cluster event checks, only those pods get
  scheduled for which something changed, and they get scheduled
  immediately without delay.
2023-07-12 11:17:04 +02:00
Patrick Ohly
ef48efc736 scheduler dynamicresources: minor logging improvements
This makes some complex values a bit more readable.
2023-07-12 11:07:59 +02:00
Kubernetes Prow Robot
e0dafe57a3 Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
c95b16b280 Merge pull request #118608 from utam0k/podtopologyspread-prescore-skip
Return Skip in PodTopologySpread#PreScore under specific conditions
2023-07-10 09:27:07 -07:00
Kubernetes Prow Robot
0ae9aaacfa Merge pull request #118271 from tangwz/add_nodeports_prefilter_skip_status
feat(NodePorts): return Skip status in PreFilter
2023-07-09 20:49:04 -07:00
Gunju Kim
7286d122fb Mark pods with restartable init containers as UnschedulableAndUnresolvable
This marks the pods with restartable init containers as
`UnschedulableAndUnresolvable` if the feature gate is disabled to avoid
the inconsistency in resource calculation between the scheduler and the
older kubelet.
2023-07-08 07:26:13 +09:00
tangwz
1bf2f6c9c0 feat(NodePorts): return Skip status in PreFilter 2023-07-06 08:42:08 +08:00
utam0k
ef26510164 Return Skip in PodTopologySpread#PreScore under specific conditions
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-06-28 12:08:10 +00:00
Kubernetes Prow Robot
52457842d1 Merge pull request #117055 from cyclinder/csi_migration
remove CSI-migration gate
2023-06-28 04:28:31 -07:00
Kubernetes Prow Robot
d9714078f8 Merge pull request #118551 from sanposhiho/event-to-register
feature(scheduler): implement ClusterEventWithHint to filter out useless events
2023-06-26 06:41:45 -07:00
Kensei Nakada
6f8d38406a feature(scheduler): implement ClusterEventWithHint to filter out useless events 2023-06-22 13:36:19 +00:00
Kubernetes Prow Robot
bc8e312857 Merge pull request #117903 from sourcelliu/dynamic
feature(DynamicResources): return Skip in PreFilter
2023-06-20 17:48:20 -07:00
Kubernetes Prow Robot
4483bf66fe Merge pull request #116635 from mengjiao-liu/contextual-logging-plugin-interpodaffinity
Migrated `pkg/scheduler/framework/plugins/interpodaffinity` to contextual logging
2023-06-09 08:14:13 -07:00
SataQiu
410b6023d6 scheduler: fix code style issues for pkg/scheduler 2023-06-05 17:29:49 +08:00
cyclinder
8e4228a8c1 remove CSI-migration gate 2023-06-04 18:40:17 +08:00
Mengjiao Liu
6d23da045f Migrated pkg/scheduler/framework/plugins/interpodaffinity to use contextual logging 2023-06-01 18:24:54 +08:00
Mengjiao Liu
074900e81b scheduler: update the scheduler interface and cache methods to use contextual logging 2023-05-29 13:26:32 +08:00
Kubernetes Prow Robot
f7cfb5f02f Merge pull request #118257 from pohly/dra-scheduler-plugin-loopvar-fix
dra scheduler plugin test: fix loopvar bug and "reserve" expected data
2023-05-26 06:06:53 -07:00
Patrick Ohly
7a6b4a9215 dra scheduler plugin test: fix loopvar bug and "reserve" expected data
The `listAll` function returned a slice where all pointers referred to the same
instance. That instance had the value of the last list entry. As a result, unit
tests only compared that element.

During the reserve phase, the first claim gets reserved in two test
cases. Those two tests must expect that change. That hadn't been noticed before
because that first claim didn't get compared.
2023-05-25 15:10:05 +02:00
Mengjiao Liu
1c05cf1d51 kube-scheduler: NewFramework function to pass the context parameter
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-05-23 10:17:34 +08:00
Kubernetes Prow Robot
c7c41d27b4 Merge pull request #117834 from NoicFank/cleanup-scheduler-node-must-not-nil-in-snapshot
cleanup useless null pointer check about nodeInfo.Node() from snapshot for in-tree plugins
2023-05-20 15:16:18 -07:00
dingzhu lurong
ed26fcf5b8 cleanup useless null pointer check about nodeInfo.Node() from snapshot for in-tree plugins 2023-05-20 22:53:43 +08:00
Kubernetes Prow Robot
da1b9df26c Merge pull request #118032 from kerthcet/cleanup/interpodaffinity2
Chore: cleanup in interPodAffinity
2023-05-17 14:00:33 -07:00
Kubernetes Prow Robot
53772982be Merge pull request #116829 from mengjiao-liu/contextual-logging-scheduler-plugin-volumezone
Migrated the volumezone scheduler plugin to use contextual logging
2023-05-16 09:53:35 -07:00
kerthcet
3ac7497361 Chore: cleanup in interpodaffinity
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-05-16 14:46:15 +08:00
mantuliu
6e2ea32fc8 feature(DynamicResources): return Skip in PreFilter 2023-05-15 00:06:08 +08:00
Kubernetes Prow Robot
58e13496d6 Merge pull request #116842 from mengjiao-liu/contextual-logging-scheduler-runtime
Migrated `pkg/scheduler/framework/runtime` to use contextual logging
2023-05-11 10:59:02 -07:00
Mengjiao Liu
fe728996ca scheduler test: call frameworkruntime.WithLogger function for contextual logging 2023-05-11 15:46:08 +08:00
utam0k
c0611b6bb3 Return Skip in InterPodAffinity#PreScore under specific conditions
This commit updates the InterPodAffinity PreScore to return a Skip status when the following conditions are met:
1. There are no nodes to score.
2. The incoming pod has no inter-pod affinities && the `IgnorePreferredTermsOfExistingPods` option is enabled.

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-05-10 13:02:23 +00:00
Kubernetes Prow Robot
47f1bd9f80 Merge pull request #117649 from SataQiu/scheduler-remove-v1beta2-20230427
scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration  component config
2023-05-03 09:54:41 -07:00
Kubernetes Prow Robot
0d67dd689b Merge pull request #117683 from utam0k/skip-topologyspread-empty
Add check to skip PodTopologySpread PreFilter if no constraints are specified
2023-05-03 06:48:24 -07:00
SataQiu
1f7c07f355 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration 2023-05-03 21:43:19 +08:00