Commit Graph

880 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
6c292ce270
Merge pull request #105245 from yibozhuang/lost-pvc-prefilter-optimization
Scheduler volumebinding plugin - handle Lost PVC as UnschedulableAndUnresolvable
2021-10-05 08:23:09 -07:00
Kubernetes Prow Robot
04f747d09f
Merge pull request #104782 from kerthcet/cleanup/remove-cc-v1beta1
remove scheduler component config v1beta1
2021-10-04 08:53:08 -07:00
Kubernetes Prow Robot
e414cf7641
Merge pull request #100482 from pohly/generic-ephemeral-volume-checks
generic ephemeral volume checks
2021-10-01 10:47:22 -07:00
Patrick Ohly
1d181ad84f scheduler: fail the volume attach limit when PVC is missing
Previously, the situation was ignored, which might have had the effect that Pod
scheduling continued (?) even though the Pod+PVC weren't known to be in an
acceptable state.
2021-10-01 17:03:44 +02:00
Patrick Ohly
1e26115df5 consider ephemeral volumes for host path and node limits check
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
2021-10-01 17:03:44 +02:00
Yibo Zhuang
b8fe514232 Scheduler volumebinding plugin - handle Lost PVC as
UnschedulableAndUnresolvable

This change adds an additional check in the volumebinding scheduler
plugin to handle PVC with phase ClaimLost which will allow the
scheduler to return UnschedulableAndUnresolvable during the PreFilter
stage and skip the rest of the node evaluation since the PVC is
bound to a PV that does not exist.

Without this change, the FailedScheduling error message would look like:

0/10 nodes are available: 2 node(s) had taint {node/test: true},
that the pod didn't tolerate, 6 node(s) had taint {node/unhealthy: true},
that the pod didn't tolerate, 2 pvc(s) bound to non-existent pv(s)

Which is still evaluating every single node to determine that the pod
cannot be scheduled because the PVC is bound to a non-existent PV

With this change, the FailedScheduling error message would look like:

0/10 nodes are available: 1 persistentvolumeclaim "foo" bound
to non-existent persistentvolume "bar"

Signed-off By: Yibo Zhuang <yibzhuang@gmail.com>
2021-09-30 21:49:46 -07:00
kerthcet
75a255d2ed remove scheduler component config v1beta1
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-09-28 13:13:17 +08:00
Kubernetes Prow Robot
16fdb2f391
Merge pull request #105196 from yibozhuang/improve-scheduler-pv-not-exist-err
Enhance ErrReasonPVNotExist in volumebinding scheduler plugin
2021-09-27 19:04:42 -07:00
Yibo Zhuang
603a4e1931 Enhance ErrReasonPVNotExist in volumebinding scheduler plugin
This change will make the message more clear when there
is a case of PVC(s) bound to PV(s) that no longer exists
and scheduler does not select the node due to this issue.

Previous error message would look like:
0/2 nodes are available: 2 pvc(s) bound to non-existent pv(s)

Updated message looks like:
0/2 nodes are available: 2 node(s) unavailable due to one or more
pvc(s) bound to non-existent pv(s)

For larger clusters with many different reasons of nodes that
are not available, the current message can be very misleading for
users to think that there are many PVCs lost due to PVs deleted but
in fact it could be just a single PVC case but many nodes not selected
by the scheduler due to this case.

Signed-off By: Yibo Zhuang <yibzhuang@gmail.com>
2021-09-27 15:02:35 -07:00
Kubernetes Prow Robot
dca007bac8
Merge pull request #105212 from BinacsLee/binacs-scheduler-do-not-reference-range-loop-vars
scheduler: do not reference range-loop variable
2021-09-27 13:59:45 -07:00
Kubernetes Prow Robot
24408ef7f5
Merge pull request #104892 from zzchun/fix-typo-in-node_affinity_test
fix typo in node_affinity_test
2021-09-23 13:22:55 -07:00
BinacsLee
f6864316ef scheduler: do not reference range-loop variable 2021-09-23 21:59:30 +08:00
Wei Huang
3b64c1b01d
sched: de-duplicate plugin registration logic by using FactoryAdapter 2021-09-20 10:12:34 -07:00
Kubernetes Prow Robot
fb70ca9b7b
Merge pull request #105046 from alculquicondor/system-spreading
Skip check for all topology labels when using system default spreading
2021-09-16 11:36:14 -07:00
Aldo Culquicondor
609306dd5b Skip check for all topology labels when using system default spreading
Checking for all topology labels is not backwards compatible. Clusters were nodes don't have zone labels effectively have default spreading disabled.

Change only applies to system defaults.
2021-09-16 09:37:56 -04:00
Patrick Ohly
1d656d46a2 scheduler: avoid repeated boilerplate code when registering plugins
Some plugins expect the new feature gate struct. We can inject that additional
parameter via a helper function instead of having to repeat the same anonymous
function for each plugin.
2021-09-16 11:23:57 +02:00
Kubernetes Prow Robot
c6dfe7343e
Merge pull request #103493 from cofyc/fix103431
scheduler/volumebinding: migrate to use pkg/scheduler/framework/plugins/feature
2021-09-13 13:27:50 -07:00
Yecheng Fu
82b50dcb7b scheduler/volumebinding: migrate to use pkg/scheduler/framework/plugins/feature 2021-09-11 10:17:28 +08:00
Kubernetes Prow Robot
cf535b0339
Merge pull request #104866 from zzchun/fix-typo-in-framework-interface
fix typo in framework interface
2021-09-10 06:46:01 -07:00
zzchun
7c17672ae7 fix typo in node_affinity_test
Signed-off-by: zzchun <zzchun@zju.edu.cn>
2021-09-10 15:10:24 +08:00
zzchun
3ad158d62c fix typo in framework interface
Signed-off-by: zzchun <zzchun@zju.edu.cn>
2021-09-10 10:14:21 +08:00
Dave Chen
6e1835b83b Fix couple of incorrect description
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-09-08 10:08:23 +08:00
Kubernetes Prow Robot
b12379ef1c
Merge pull request #104605 from pohly/ephemeral-volume-events
scheduler: more informative generic ephemeral volume events
2021-09-03 17:51:19 -07:00
zc
b33897f36d modify non-uniform aliases 2021-08-31 09:07:51 +08:00
Kubernetes Prow Robot
7282c2002e
Merge pull request #99273 from yangjunmyfm192085/run-test20
Structured Logging migration:modify Scheduler part logs.
2021-08-30 05:56:54 -07:00
Patrick Ohly
89cb4d0ee9 scheduler: better reason for delay with generic ephemeral volumes
These events are currently emitted for a pod using a generic ephemeral volume:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  3s    default-scheduler  0/1 nodes are available: 1 persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume" not found.
  Warning  FailedScheduling  2s    default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

The one about "persistentvolumeclaim not found" is potentially confusing. It
occurs because the scheduler typically checks the pod before the ephemeral
volume controller had a chance to create the PVC.

This is a bit easier to understand:

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  4s    default-scheduler  0/1 nodes are available: 1 waiting for ephemeral volume controller to create the persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume".
  Warning  FailedScheduling  2s    default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
2021-08-30 10:06:59 +02:00
JunYang
93047824f7 Structured Logging migration:modify Scheduler part logs.
Signed-off-by: JunYang <yang.jun22@zte.com.cn>
2021-08-29 20:17:08 +08:00
dntosas
cd795fa2eb
[scheduler] Remove deprecated volumeSchedulingLatency metric
As part of https://github.com/kubernetes/kubernetes/pull/100720 we
backported fix on existing releases and in this commit we completely
remove the deprecated metric from master branch.

Signed-off-by: dntosas <ntosas@gmail.com>
2021-08-23 15:18:16 +03:00
dntosas
7cbac6bde0 [volumeScheduling/metrics] Fix buckets initialization
This metrics is measured in seconds so it makes no sense starting from
1000 as init value. This breaks also the scheduler e2e metric thus make
users unable to compute, for example, their SLO for the scheduler.
Even if this metric is deprecated, it should behave correctly until it is
completely removed to avoid user confusion.

For example, for each volume created, the minimum value exposed
as a metric is 16.6min (1000sec/60) which is obviously wrong as logic.

In this commit, we migrate bucket creation to start from reasonable
numbers, copying the incrementation from the conventions that the
scheduler follows itself.

Signed-off-by: dntosas <ntosas@gmail.com>
2021-08-17 12:49:40 +03:00
Konstantin Misyutin
29bd66d018 Remove "pkg/controller/volume/scheduling" dependency from "pkg/scheduler/framework/plugins"
All dependencies of VolumeBinding plugin from
"k8s.io/kubernetes/pkg/controller/volume/scheduling" package moved to
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding" package:

- whole file pkg/controller/volume/scheduling/scheduler_assume_cache.go
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache_test.go
- whole file pkg/controller/volume/scheduling/scheduler_binder.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_fake.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_test.go

Package "k8s.io/kubernetes/pkg/controller/volume/scheduling/metrics" moved
to "k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/metrics"
because it only used in VolumeBinding plugin and (e2e) tests.

More described in issue #89930 and PR #102953.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-08-13 19:08:45 +08:00
Kubernetes Prow Robot
27b02a3e37
Merge pull request #104030 from chendave/refactoring_new
Refactor defaultpreemption for out-of-tree plugins
2021-08-11 18:38:00 -07:00
Dave Chen
3af26bae2c Refactor defaultpreemption for out-of-tree plugins
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-08-11 09:29:17 +08:00
Wei Huang
dc079acc2b
sched: retry unschedule pods immediately after a waiting pod's deletion 2021-08-06 19:08:37 -07:00
Kubernetes Prow Robot
3ee63b09b7
Merge pull request #103775 from jyz0309/optimze
Optimize the for range code in types.go
2021-08-04 22:12:53 -07:00
Kubernetes Prow Robot
33778cb2ba
Merge pull request #103757 from sanposhiho/fix/scheduler/framework/add-doc-on-status-reason
Add: specify that reason is a field to record the reason why failed
2021-08-04 22:12:46 -07:00
Kubernetes Prow Robot
b16f7e841d
Merge pull request #103686 from kerthcet/document/add_comment_for_enqueueExtensions
update comment with EnqueueExtensions
2021-08-04 22:11:48 -07:00
houjun
8dc091ef2e Fix wrong log 2021-07-26 11:38:02 +08:00
sanposhiho
6680368958 Add: specify that reason is a field to record the reason why failed 2021-07-20 20:33:40 +09:00
jyz0309
d05b232afc optimize the code
Signed-off-by: jyz0309 <45495947@qq.com>
2021-07-20 09:16:14 +08:00
kerthcet
d1e9da9f8a update comment with EnqueueExtensions
Signed-off-by: kerthcet <kerthcet@gmail.com>

update comment with EnqueueExtensions

Signed-off-by: kerthcet <kerthcet@gmail.com>

update comment with EnqueueExtensions

Signed-off-by: kerthcet <kerthcet@gmail.com>

update comment with EnqueueExtensions

Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-07-16 09:56:26 +08:00
Wei Huang
fb9cafc99b
sched: provide an option for plugin developers to move pods to activeQ 2021-07-07 12:50:12 -07:00
Kubernetes Prow Robot
ea3bcbc205
Merge pull request #101946 from chendave/balance_allocation
Support extended resource in NodeResourcesBalancedAllocation plugin
2021-07-06 10:42:19 -07:00
Yecheng Fu
83ee392ed4 implement EnqueueExtensions interface in volumebinding 2021-07-03 08:25:06 +08:00
Kubernetes Prow Robot
25bbe2ebc5
Merge pull request #99594 from cofyc/kep1845-api
Prioritizing nodes based on volume capacity: API changes
2021-07-01 15:35:51 -07:00
Yecheng Fu
b522e95aae Prioritizing nodes based on volume capacity: API changes 2021-07-01 10:00:59 +08:00
Kubernetes Prow Robot
385402d506
Merge pull request #103082 from chrishenzie/read-write-once-pod-access-mode-scheduler
Enforce ReadWriteOncePod during scheduling
2021-06-30 16:11:36 -07:00
Chris Henzie
7ad44d04fc Enforce ReadWriteOncePod access mode during scheduling
Check the PVC ref count on the node info cache to determine if a pod's
PVCs are in use. If they are and it is using ReadWriteOncePod, fail the
request.
2021-06-30 10:40:14 -07:00
Dave Chen
1fa673c15c Extent the NodeResourcesBalancedAllocation plugin to cover more resources
Signed-off-by: Dave Chen <dave.chen@arm.com>
2021-06-30 11:15:12 +08:00
Kubernetes Prow Robot
e0f66be1aa
Merge pull request #101822 from yuzhiquan/NodeResourcesFit-score
Add score func for NodeResourcesFit plugin
2021-06-29 13:42:20 -07:00
yuzhiquan
deb14b995a Add score plugin for NodeResourcesFit 2021-06-29 13:16:55 -04:00