Commit Graph

646 Commits

Author SHA1 Message Date
Wei Huang
b7d90ca991
sched: adjust events to register for VolumeBinding plugin 2021-10-07 08:51:04 -07:00
Kubernetes Prow Robot
6c292ce270
Merge pull request #105245 from yibozhuang/lost-pvc-prefilter-optimization
Scheduler volumebinding plugin - handle Lost PVC as UnschedulableAndUnresolvable
2021-10-05 08:23:09 -07:00
Kubernetes Prow Robot
04f747d09f
Merge pull request #104782 from kerthcet/cleanup/remove-cc-v1beta1
remove scheduler component config v1beta1
2021-10-04 08:53:08 -07:00
Kubernetes Prow Robot
e414cf7641
Merge pull request #100482 from pohly/generic-ephemeral-volume-checks
generic ephemeral volume checks
2021-10-01 10:47:22 -07:00
Patrick Ohly
1d181ad84f scheduler: fail the volume attach limit when PVC is missing
Previously, the situation was ignored, which might have had the effect that Pod
scheduling continued (?) even though the Pod+PVC weren't known to be in an
acceptable state.
2021-10-01 17:03:44 +02:00
Patrick Ohly
1e26115df5 consider ephemeral volumes for host path and node limits check
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
2021-10-01 17:03:44 +02:00
Yibo Zhuang
b8fe514232 Scheduler volumebinding plugin - handle Lost PVC as
UnschedulableAndUnresolvable

This change adds an additional check in the volumebinding scheduler
plugin to handle PVC with phase ClaimLost which will allow the
scheduler to return UnschedulableAndUnresolvable during the PreFilter
stage and skip the rest of the node evaluation since the PVC is
bound to a PV that does not exist.

Without this change, the FailedScheduling error message would look like:

0/10 nodes are available: 2 node(s) had taint {node/test: true},
that the pod didn't tolerate, 6 node(s) had taint {node/unhealthy: true},
that the pod didn't tolerate, 2 pvc(s) bound to non-existent pv(s)

Which is still evaluating every single node to determine that the pod
cannot be scheduled because the PVC is bound to a non-existent PV

With this change, the FailedScheduling error message would look like:

0/10 nodes are available: 1 persistentvolumeclaim "foo" bound
to non-existent persistentvolume "bar"

Signed-off By: Yibo Zhuang <yibzhuang@gmail.com>
2021-09-30 21:49:46 -07:00
kerthcet
75a255d2ed remove scheduler component config v1beta1
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-09-28 13:13:17 +08:00
Yibo Zhuang
603a4e1931 Enhance ErrReasonPVNotExist in volumebinding scheduler plugin
This change will make the message more clear when there
is a case of PVC(s) bound to PV(s) that no longer exists
and scheduler does not select the node due to this issue.

Previous error message would look like:
0/2 nodes are available: 2 pvc(s) bound to non-existent pv(s)

Updated message looks like:
0/2 nodes are available: 2 node(s) unavailable due to one or more
pvc(s) bound to non-existent pv(s)

For larger clusters with many different reasons of nodes that
are not available, the current message can be very misleading for
users to think that there are many PVCs lost due to PVs deleted but
in fact it could be just a single PVC case but many nodes not selected
by the scheduler due to this case.

Signed-off By: Yibo Zhuang <yibzhuang@gmail.com>
2021-09-27 15:02:35 -07:00
Kubernetes Prow Robot
24408ef7f5
Merge pull request #104892 from zzchun/fix-typo-in-node_affinity_test
fix typo in node_affinity_test
2021-09-23 13:22:55 -07:00
Wei Huang
3b64c1b01d
sched: de-duplicate plugin registration logic by using FactoryAdapter 2021-09-20 10:12:34 -07:00
Kubernetes Prow Robot
fb70ca9b7b
Merge pull request #105046 from alculquicondor/system-spreading
Skip check for all topology labels when using system default spreading
2021-09-16 11:36:14 -07:00
Aldo Culquicondor
609306dd5b Skip check for all topology labels when using system default spreading
Checking for all topology labels is not backwards compatible. Clusters were nodes don't have zone labels effectively have default spreading disabled.

Change only applies to system defaults.
2021-09-16 09:37:56 -04:00
Patrick Ohly
1d656d46a2 scheduler: avoid repeated boilerplate code when registering plugins
Some plugins expect the new feature gate struct. We can inject that additional
parameter via a helper function instead of having to repeat the same anonymous
function for each plugin.
2021-09-16 11:23:57 +02:00
Yecheng Fu
82b50dcb7b scheduler/volumebinding: migrate to use pkg/scheduler/framework/plugins/feature 2021-09-11 10:17:28 +08:00
zzchun
7c17672ae7 fix typo in node_affinity_test
Signed-off-by: zzchun <zzchun@zju.edu.cn>
2021-09-10 15:10:24 +08:00
Dave Chen
6e1835b83b Fix couple of incorrect description
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-09-08 10:08:23 +08:00
Kubernetes Prow Robot
b12379ef1c
Merge pull request #104605 from pohly/ephemeral-volume-events
scheduler: more informative generic ephemeral volume events
2021-09-03 17:51:19 -07:00
zc
b33897f36d modify non-uniform aliases 2021-08-31 09:07:51 +08:00
Patrick Ohly
89cb4d0ee9 scheduler: better reason for delay with generic ephemeral volumes
These events are currently emitted for a pod using a generic ephemeral volume:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  3s    default-scheduler  0/1 nodes are available: 1 persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume" not found.
  Warning  FailedScheduling  2s    default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

The one about "persistentvolumeclaim not found" is potentially confusing. It
occurs because the scheduler typically checks the pod before the ephemeral
volume controller had a chance to create the PVC.

This is a bit easier to understand:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  4s    default-scheduler  0/1 nodes are available: 1 waiting for ephemeral volume controller to create the persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume".
  Warning  FailedScheduling  2s    default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
2021-08-30 10:06:59 +02:00
dntosas
cd795fa2eb
[scheduler] Remove deprecated volumeSchedulingLatency metric
As part of https://github.com/kubernetes/kubernetes/pull/100720 we
backported fix on existing releases and in this commit we completely
remove the deprecated metric from master branch.

Signed-off-by: dntosas <ntosas@gmail.com>
2021-08-23 15:18:16 +03:00
dntosas
7cbac6bde0 [volumeScheduling/metrics] Fix buckets initialization
This metrics is measured in seconds so it makes no sense starting from
1000 as init value. This breaks also the scheduler e2e metric thus make
users unable to compute, for example, their SLO for the scheduler.
Even if this metric is deprecated, it should behave correctly until it is
completely removed to avoid user confusion.

For example, for each volume created, the minimum value exposed
as a metric is 16.6min (1000sec/60) which is obviously wrong as logic.

In this commit, we migrate bucket creation to start from reasonable
numbers, copying the incrementation from the conventions that the
scheduler follows itself.

Signed-off-by: dntosas <ntosas@gmail.com>
2021-08-17 12:49:40 +03:00
Konstantin Misyutin
29bd66d018 Remove "pkg/controller/volume/scheduling" dependency from "pkg/scheduler/framework/plugins"
All dependencies of VolumeBinding plugin from
"k8s.io/kubernetes/pkg/controller/volume/scheduling" package moved to
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding" package:

- whole file pkg/controller/volume/scheduling/scheduler_assume_cache.go
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache_test.go
- whole file pkg/controller/volume/scheduling/scheduler_binder.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_fake.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_test.go

Package "k8s.io/kubernetes/pkg/controller/volume/scheduling/metrics" moved
to "k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/metrics"
because it only used in VolumeBinding plugin and (e2e) tests.

More described in issue #89930 and PR #102953.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-08-13 19:08:45 +08:00
Kubernetes Prow Robot
27b02a3e37
Merge pull request #104030 from chendave/refactoring_new
Refactor defaultpreemption for out-of-tree plugins
2021-08-11 18:38:00 -07:00
Dave Chen
3af26bae2c Refactor defaultpreemption for out-of-tree plugins
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-08-11 09:29:17 +08:00
houjun
8dc091ef2e Fix wrong log 2021-07-26 11:38:02 +08:00
Kubernetes Prow Robot
ea3bcbc205
Merge pull request #101946 from chendave/balance_allocation
Support extended resource in NodeResourcesBalancedAllocation plugin
2021-07-06 10:42:19 -07:00
Yecheng Fu
83ee392ed4 implement EnqueueExtensions interface in volumebinding 2021-07-03 08:25:06 +08:00
Kubernetes Prow Robot
25bbe2ebc5
Merge pull request #99594 from cofyc/kep1845-api
Prioritizing nodes based on volume capacity: API changes
2021-07-01 15:35:51 -07:00
Yecheng Fu
b522e95aae Prioritizing nodes based on volume capacity: API changes 2021-07-01 10:00:59 +08:00
Kubernetes Prow Robot
385402d506
Merge pull request #103082 from chrishenzie/read-write-once-pod-access-mode-scheduler
Enforce ReadWriteOncePod during scheduling
2021-06-30 16:11:36 -07:00
Chris Henzie
7ad44d04fc Enforce ReadWriteOncePod access mode during scheduling
Check the PVC ref count on the node info cache to determine if a pod's
PVCs are in use. If they are and it is using ReadWriteOncePod, fail the
request.
2021-06-30 10:40:14 -07:00
Dave Chen
1fa673c15c Extent the NodeResourcesBalancedAllocation plugin to cover more resources
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-06-30 11:15:12 +08:00
Kubernetes Prow Robot
e0f66be1aa
Merge pull request #101822 from yuzhiquan/NodeResourcesFit-score
Add score func for NodeResourcesFit plugin
2021-06-29 13:42:20 -07:00
yuzhiquan
deb14b995a Add score plugin for NodeResourcesFit 2021-06-29 13:16:55 -04:00
Wei Huang
20f84b12a1
Optimize scheduler res scorer on non-requested extended res 2021-06-25 11:41:36 -07:00
Kubernetes Prow Robot
72fc6d9ea0
Merge pull request #103089 from chendave/ratio_cleanup
Simplify the formula used in the `RequestedToCapacityRatio` plugin
2021-06-22 12:00:23 -07:00
ravisantoshgudimetla
b6c75bee15 Remove balanced attached node volumes
kubernetes#60525 introduced
Balanced attached node volumes feature gate to include volume
count for prioritizing nodes. The reason for introducing this
flag was its usefulness in Red Hat OpenShift Online environment
which is not being used any more. So, removing the flag
as it helps in maintainability of the scheduler code base
as mentioned at kubernetes#101489 (comment)
2021-06-22 11:19:30 -04:00
Dave Chen
0f922b200f Simplify the formula used in the RequestedToCapacityRatio plugin
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-06-22 17:11:26 +08:00
Aldo Culquicondor
63d1237102 Fix Node Resources plugins score when there are pods with no requests
Given that we give a default CPU/memory requests for containers that don't provide any, the calculated usage can exceed the allocatable.

Change-Id: I72e249652acacfbe8cea0dd6f895dabe43ff6376
2021-06-16 20:36:07 +00:00
Wei Huang
36eaa11d50
cleanup usage of NewPodNominator
- replace NewPodNominator() with NewSafePodNominator()
- rename nominatedPodMap to nominator
2021-06-10 14:01:07 -07:00
Abdullah Gharaibeh
46f3e4dfdd Define in-tree scheduler plugin names in separate pkg to break a cyclic depednecy when moving plugin defaulting to CC 2021-06-09 15:36:09 -04:00
Kubernetes Prow Robot
6cb421487a
Merge pull request #99597 from adtac/v1b2
scheduler CC: add v1beta2 API, deprecate plugins
2021-06-08 12:26:08 -07:00
Adhityaa Chandrasekar
3c8e56bef9 scheduler: graduate CC to v1beta2, deprecate plugins
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
2021-06-07 12:42:55 +00:00
Kubernetes Prow Robot
2e6e8857d1
Merge pull request #102518 from chendave/cleanup_constat
Cleanup redundant failure reason in InterPodAffinity plugin
2021-06-02 09:46:34 -07:00
Dave Chen
b049e1b9ab Cleanup redundant failure reason in InterPodAffinity plugin
Both `ErrReasonAffinityRulesNotMatch` and `ErrReasonAntiAffinityRulesNotMatch` are
more precise than `ErrReasonAffinityNotMatch`.

Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-06-02 11:52:55 +08:00
Abdullah Gharaibeh
4567a43101 Return UnschedulableAndUnresolvable when looking up volume-related resources returns NotFound error 2021-06-01 09:19:04 -04:00
Kubernetes Prow Robot
ae1f28d7b0
Merge pull request #102306 from ahg-g/ahg-vol-restrictions
Return UnschedulableAndUnresolvable instead of Error when failing to lookup volume-related resources
2021-05-27 20:48:25 -07:00
Kubernetes Prow Robot
2da8d1c18f
Merge pull request #102234 from sanposhiho/scheduler/add/interface-check-on-nodeaffinity
scheduler/add: interface check on nodeaffinity
2021-05-26 03:17:32 -07:00
Kubernetes Prow Robot
aa0017ad13
Merge pull request #102236 from ahg-g/ahg-spread
Use ownerReference to build default spreading constraints
2021-05-25 19:39:28 -07:00