Commit Graph

230 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Kubernetes Submit Queue
348bf1e032 Merge pull request #46627 from deads2k/api-12-labels
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)

move labels to components which own the APIs

During the apimachinery split in 1.6, we accidentally moved several label APIs into apimachinery.  They don't belong there, since the individual APIs are not general machinery concerns, but instead are the concern of particular components: most commonly the kubelet.  This pull moves the labels into their owning components and out of API machinery.

@kubernetes/sig-api-machinery-misc @kubernetes/api-reviewers @kubernetes/api-approvers 
@derekwaynecarr  since most of these are related to the kubelet
2017-06-02 23:37:38 -07:00
Jing Xu
943fc53bf7 Add predicates check for local storage request
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
2017-06-01 15:57:50 -07:00
deads2k
954eb3ceb9 move labels to components which own the APIs 2017-05-31 10:32:06 -04:00
Klaus Ma
fd2575e43e Added unit test for node operation in schedulercache. 2017-05-31 21:26:04 +08:00
Guangya Liu
9ae3107aab Highlight nodeSelector when checking nodeSelector for Pod. 2017-05-30 20:30:40 +08:00
Michelle Au
61de4870de Scheduler predicate for already bound PVs with node affinity 2017-05-22 14:46:03 -07:00
Klaus Ma
83b7f77ee2 Moved qos to api.helpers. 2017-05-20 07:17:57 -04:00
Wojciech Tyczynski
15c492bb2e Fixes and minor cleanups to pod (anti)affinity predicate 2017-04-28 13:22:07 +02:00
Chao Xu
d4850b6c2b move pkg/api/v1/helpers.go to subpackage 2017-04-14 14:25:11 -07:00
Kubernetes Submit Queue
fed535e199 Merge pull request #42524 from k82cn/used_ports_per_node
Automatic merge from submit-queue (batch tested with PRs 41775, 39678, 42629, 42524, 43028)

Aggregated used ports at the NodeInfo level.

fixes #42523

```release-note
Aggregated used ports at the NodeInfo level for `PodFitsHostPorts` predicate.
```
2017-04-07 17:44:19 -07:00
Xiaoyu Zhang
e3d534b2c4 Fix a type
Fix a type
2017-03-31 10:17:19 +08:00
Connor Doyle
364dbc0ca5 Revert "Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.""
- This reverts commit 60758f3fff.
- Disabled opaque integer resource end-to-end tests.
2017-03-06 17:48:09 -08:00
Dawn Chen
60758f3fff Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available." 2017-03-06 14:27:17 -08:00
Klaus Ma
1c5292bc2c Aggregated used ports at the NodeInfo level. 2017-03-05 11:09:42 +08:00
Connor Doyle
8a42189690 Fix unbounded growth of cached OIRs in sched cache
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
  they are also removed from allocatable.
- Fixes #41861.
2017-03-04 09:26:22 -08:00
Janet Kuo
4c882477e9 Make DaemonSet respect critical pods annotation when scheduling 2017-02-27 09:59:45 -08:00
Kubernetes Submit Queue
1359ffc502 Merge pull request #41818 from aveshagarwal/master-taints-tolerations-api-fields-pod-spec-updates
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)

Allow updates to pod tolerations.

Opening this PR to continue discussion for pod spec tolerations updates when a pod has been scheduled already. This PR is built on top of https://github.com/kubernetes/kubernetes/pull/38957.

@kubernetes/sig-scheduling-pr-reviews @liggitt @davidopp @derekwaynecarr @kubernetes/rh-cluster-infra
2017-02-26 14:02:51 -08:00
Avesh Agarwal
b9d95b4426 Allow toleration updates via pod spec. 2017-02-23 11:06:13 -05:00
Andy Goldstein
9d8d6ad16c Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
2017-02-23 09:57:12 -05:00
Avesh Agarwal
9b640838a5 Change taint/toleration annotations to api fields. 2017-02-22 09:27:42 -05:00
Kubernetes Submit Queue
f2e234e47f Merge pull request #41398 from codablock/azure_max_pd
Automatic merge from submit-queue

Add scheduler predicate to filter for max Azure disks attached

**What this PR does / why we need it**: This PR adds scheduler predicates for maximum Azure Disks count. This allows to use the environment variable KUBE_MAX_PD_VOLS on scheduler the same as it's already possible with GCE and AWS.

This is needed as we need a way to specify the maximum attachable disks on Azure to avoid permanently failing disk attachment in cases k8s scheduled too many PODs with AzureDisk volumes onto the same node. 

I've chosen 16 as the default value for DefaultMaxAzureDiskVolumes even though it may be too high for many smaller VM types and too low for the larger VM types. This means, the default behavior may change for clusters with large VM types. For smaller VM types, the behavior will not change (it will keep failing attaching).

In the future, the value should be determined at run time on a per node basis, depending on the VM size. I know that this is already implemented in the ongoing Azure Managed Disks work, but I don't remember where to find this anymore and also forgot who was working on this. Maybe @colemickens can help here.

**Release note**:

```release-note
Support KUBE_MAX_PD_VOLS on Azure
```

CC @colemickens @brendandburns
2017-02-21 06:11:09 -08:00
Kubernetes Submit Queue
ba6dca94bc Merge pull request #41458 from humblec/iscsi-nodisk-conflict
Automatic merge from submit-queue

Adjust nodiskconflict support based on iscsi multipath.

With the multipath support is in place, to declare whether both iscsi disks are same, we need to only depend on IQN.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2017-02-20 03:54:41 -08:00
Alexander Block
73a0083a84 Add scheduler predicate to filter for max Azure disks attached 2017-02-20 09:00:18 +01:00
Timothy St. Clair
2bcd63c524 Cleanup work to enable feature gating annotations 2017-02-18 09:25:57 -06:00
Robert Rati
32c4683242 Feature-Gate affinity in annotations 2017-02-18 09:08:38 -06:00
Humble Chirammal
7a1ac6c6db Adjust nodiskconflict support based on iscsi multipath feature.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2017-02-16 16:24:53 +05:30
Kubernetes Submit Queue
4ed86f5d46 Merge pull request #41076 from gyliu513/port-forward
Automatic merge from submit-queue

Removed a space in portforward.go.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-02-08 07:59:10 -08:00
Guangya Liu
9607edc556 Clean up for some typo.
1) Removed a space in portforward.go.
2) Renamed `lockAquisitionFunc` to `lockAcquisitionFunc` in
controller.go.
3) Fixed typo in predicates.go.
2017-02-08 09:39:03 +08:00
gmarek
37585b06e0 Scheduler doesn't schedule Pods not tolerating NoExecute Taints 2017-02-07 13:56:48 +01:00
Kevin
36dcb57407 forgiveness library changes 2017-01-31 21:39:17 +08:00
Kubernetes Submit Queue
3dbbd0bdf4 Merge pull request #40606 from deads2k/client-17-sync
Automatic merge from submit-queue (batch tested with PRs 34543, 40606)

sync client-go and move util/workqueue

The vision of client-go is that it provides enough utilities to build a reasonable controller.  It has been copying `util/workqueue`.  This makes it authoritative.

@liggitt I'm getting really close to making client-go authoritative ptal.

approved based on https://github.com/kubernetes/kubernetes/issues/40363
2017-01-30 08:19:10 -08:00
Kubernetes Submit Queue
83791b0ee4 Merge pull request #34543 from ivan4th/dont-require-failure-domains-for-pod-affinity-checker
Automatic merge from submit-queue

Don't require failureDomains in PodAffinityChecker

`failureDomains` are only used for `PreferredDuringScheduling` pod
anti-affinity, which is ignored by `PodAffinityChecker`.
This unnecessary requirement was making it hard to move
`PodAffinityChecker` to `GeneralPredicates` because that would require
passing `--failure-domains` to both `kubelet` and `kube-controller-manager`.
2017-01-30 08:18:32 -08:00
deads2k
2c1c0f3f72 move workqueue to client-go 2017-01-30 09:08:21 -05:00
deads2k
1ce0637b27 move listers out of cache to reduce import tree 2017-01-20 15:01:38 -05:00
Kubernetes Submit Queue
b2e134a724 Merge pull request #36693 from ConnorDoyle/oir-cleanup
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)

Minor hygiene in scheduler.

**What this PR does / why we need it**:

Minor cleanups in scheduler, related to PR #31652.

- Unified lazy opaque resource caching.
- Deleted a commented-out line of code.

**Release note**:
```release-note
N/A
```
2017-01-20 09:18:49 -08:00
Clayton Coleman
9a2a50cda7
refactor: use metav1.ObjectMeta in other types 2017-01-17 16:17:19 -05:00
Connor Doyle
94b9c0e20c Minor hygiene in scheduler.
- Unified lazy opaque resource caching.
- Deleted a commented-out line of code.
2017-01-17 07:00:07 -08:00
Kubernetes Submit Queue
b1506004cc Merge pull request #39601 from mqliang/upstream-tolerates-taints-bugfix
Automatic merge from submit-queue (batch tested with PRs 39945, 39601)

bugfix for PodToleratesNodeTaints

`PodToleratesNodeTaints`predicate func should return true if pod has no toleration annotations and node's taint effect is `PreferNoSchedule`
2017-01-17 04:08:47 -08:00
Klaus Ma
c184fef6e6 Fixed pod anti-affinity bugs. 2017-01-17 13:28:54 +08:00
Robert Rati
6a3ad93d6c [scheduling] Moved pod affinity and anti-affinity from annotations to api
fields. #25319
2017-01-12 14:54:29 -05:00
deads2k
6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Seth Jennings
4c30459e49 switch from local qos types to api types 2017-01-10 10:54:30 -06:00
mqliang
d473646855 bugfix for PodToleratesNodeTaints 2017-01-09 18:16:43 +08:00
Kubernetes Submit Queue
69ddd8eb27 Merge pull request #39247 from wojtek-t/optimize_controller_manager_memory
Automatic merge from submit-queue

Avoid unnecessary memory allocations

Low-hanging fruits in saving memory allocations. During our 5000-node kubemark runs I've see this:

ControllerManager:
- 40.17% k8s.io/kubernetes/pkg/util/system.IsMasterNode
- 19.04% k8s.io/kubernetes/pkg/controller.(*PodControllerRefManager).Classify

Scheduler:
- 42.74% k8s.io/kubernetes/plugin/pkg/scheduler/algrorithm/predicates.(*MaxPDVolumeCountChecker).filterVolumes

This PR is eliminating all of those.
2016-12-28 00:02:59 -08:00
Wojciech Tyczynski
ba07a36651 Avoid copying volumes in scheduler 2016-12-27 16:11:11 +01:00
Kubernetes Submit Queue
7b134995e5 Merge pull request #37513 from xiaolou86/podAffinity
Automatic merge from submit-queue

Optimize pod affinity when predicate

Optimize by returning as early as possible to avoid invoking priorityutil.PodMatchesTermsNamespaceAndSelector.
2016-12-27 06:46:54 -08:00
Robert Rati
91931c138e [scheduling] Moved node affinity from annotations to api fields. #35518 2016-12-16 11:42:43 -05:00
Kubernetes Submit Queue
59ad9a30ca Merge pull request #36060 from resouer/fix-service-affinity
Automatic merge from submit-queue

Add use case to service affinity

Also part of nits in refactoring predicates, I found the explanation of `serviceaffinity` in its comment is very hard to understand. So I added example instead here to help user/developer to digest it.
2016-12-15 04:10:08 -08:00
Harry Zhang
a0e836a378 Add use case to service affinity 2016-12-14 16:59:35 +08:00
Humble Chirammal
28088159c3 Make iscsi pv claim aware of nodiskconflict feature.
Being ISCSI a RWO/ROX volumes it should inherit nodiskconflict feature.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2016-12-13 10:07:09 +05:30
LouZhengwei
5c65088c18 optimize pod affinity when predicate 2016-12-11 23:49:45 +08:00
Kubernetes Submit Queue
f2f107124b Merge pull request #37691 from dshulyak/term_selector
Automatic merge from submit-queue (batch tested with PRs 38377, 36365, 36648, 37691, 38339)

Do not create selector and namespaces in a loop where possible

With 1000 nodes and 5000 pods (5 pods per node) with anti-affinity a lot of CPU wasted on creating LabelSelector and sets.String (map). 

With this change we are able to deploy that number of pods in ~25 minutes. Without - it takes 30 minutes to deploy 500 pods with anti-affinity configured.
2016-12-08 10:52:01 -08:00
Kubernetes Submit Queue
6484efbc2f Merge pull request #37369 from yarntime/fix_typo_in_predicates
Automatic merge from submit-queue (batch tested with PRs 35884, 37305, 37369, 37429, 35679)

fix typo in predicates
2016-12-08 03:55:15 -08:00
Ivan Shvedunov
d40a8f3279 Don't require failureDomains in PodAffinityChecker
failureDomains are only used for PreferredDuringScheduling pod
anti-affinity, which is ignored by PodAffinityChecker.
This unnecessary requirement was making it hard to move
PodAffinityChecker to GeneralPredicates because that would require
passing --failure-domains to both kubelet and kube-controller-manager.
2016-12-08 14:08:28 +03:00
Dmitry Shulyak
55b413f504 Do not create selector and namespaces in a loop where possible
Change-Id: Ib8e62df92a3ea6b8ee6b90cb0b73af71332481d7
2016-12-08 13:04:38 +02:00
Kubernetes Submit Queue
1b5666fc35 Merge pull request #35275 from wojtek-t/cache_conditions
Automatic merge from submit-queue

Cache additional information in schedulercache.NodeInfo to speedup scheduler

Ref #35117
2016-12-07 02:23:19 -08:00
Kubernetes Submit Queue
f299a0010a Merge pull request #37558 from jayunit100/scheduler_log_spam
Automatic merge from submit-queue (batch tested with PRs 38076, 38137, 36882, 37634, 37558)

[scheduler] Use V(10) for anything which may be O(N*P) logging

Fixes #37014

This PR makes sure that logging statements which are capable of being called on a perNode / perPod basis (i.e. non essential ones that will just clog up logs at large scale) are at V(10) level.

I dreamt of a levenstein filter that built a weak map of word frequencies and alerted once log throughput increased w/o varying information content....  but then I woke up and realized this is probably all we really need for now :)
2016-12-05 19:25:57 -08:00
Clayton Coleman
3454a8d52c
refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman
5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
gmarek
cd2cceb364 Minor scheduler cleanup 2016-11-30 09:35:25 +01:00
jayunit100
7c94c51860 [scheduler] Use V(10) for anything which may be O(N*P) logging 2016-11-28 10:31:49 -05:00
Wojciech Tyczynski
7387bc0572 Cache node taints in scheduler NodeInfo 2016-11-24 16:54:06 +01:00
Wojciech Tyczynski
be2bb39964 Cache node conditions in scheduler NodeInfo 2016-11-24 16:54:05 +01:00
Chao Xu
f782aba56e plugin/scheduler 2016-11-23 15:53:09 -08:00
yarntime@163.com
22e0bdcfaa fix typo in predicates 2016-11-23 18:15:17 +08:00
Harry Zhang
5554dbf907 Fix invalid predicates describe 2016-11-19 22:30:15 +08:00
David Ashpole
9aca40dee6 revert #33218. dont need #36180. We only use diskpressure 2016-11-04 08:29:27 -07:00
Connor Doyle
c93646e8da Support opaque integer resource accounting.
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
  - Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
  - Ensures supplied opaque int quantities in node capacity,
    node allocatable, pod request and pod limit are integers.
  - Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
2016-10-28 10:15:13 -07:00
Kubernetes Submit Queue
b0a4216182 Merge pull request #33763 from jayunit100/sched-checkservice-predicateCache
Automatic merge from submit-queue

Predicate cacheing and cleanup

Fix to #31795 

First pass @ cleanup and caching of the CheckServiceAffinity function.  

The cleanup IMO is necessary because the logic around the pod listing and the use of the "implicit selector" (which is reverse engineered to enable the homogenous pod groups).

Should still pass the E2Es.

@timothysc @wojtek-t
2016-10-20 07:39:41 -07:00
jayunit100
08cff0157d PredicateMetadata factory and optimization, Cleaned up some comments,
Comments addressed, Make emptyMetadataProducer a func to avoid casting,
FakeSvcLister: remove error return for len(svc)=0.  New test for
predicatePrecomp to make method semantics explictly enforced when meta
is missing. Precompute wrapper.
2016-10-20 08:27:11 -04:00
derekwaynecarr
555231fad7 PVC informer lister supports listing 2016-10-18 14:36:33 -04:00
Harry Zhang
50eaeaa7bd Update ecache and add scheduler method 2016-10-17 11:42:16 -04:00
jayunit100
182e89b84e Part 1 of pr #33763: cleanup CheckServiceAffinity in preparation for
predicate injection support, Update metadata struct
2016-10-13 10:38:45 -04:00
David Oppenheimer
cd4e08e7ec Revert "Add kubelet awareness to taint tolerant match caculator." 2016-10-07 12:10:55 -07:00
Kubernetes Submit Queue
21188cadeb Merge pull request #26501 from resouer/scheduler
Automatic merge from submit-queue

Add kubelet awareness to taint tolerant match caculator.

Add kubelet awareness to taint tolerant match caculator.

Ref: #25320

This is required by `TaintEffectNoScheduleNoAdmit` & `TaintEffectNoScheduleNoAdmitNoExecute `, so that node will know if it should expect the taint&tolerant
2016-10-07 12:05:35 -07:00
David Ashpole
0c8a664e50 addressed comments 2016-10-03 11:42:56 -07:00
David Ashpole
fed3f37eef Split NodeDiskPressure into NodeInodePressure and NodeDiskPressure 2016-10-03 11:42:56 -07:00
Harry Zhang
c2cf5bbaf6 Setup e2e test for no admit 2016-10-01 01:07:18 -04:00
Harry Zhang
c735921b6f Add no admit on node side
Update generated code

Refactored predicates & restore helper
2016-09-22 10:12:44 -04:00
deads2k
483af28944 fix up service lister 2016-09-22 09:12:37 -04:00
Antoine Pelisse
938872582e Revert "simplify RC and SVC listers" 2016-09-21 15:49:38 -07:00
Kubernetes Submit Queue
2d9d84dc64 Merge pull request #32888 from deads2k/client-10-fixup-remaining-listers
Automatic merge from submit-queue

simplify RC and SVC listers

Make the RC and SVC listers use the common list functions that more closely match client APIs, are consistent with other listers, and avoid unnecessary copies.
2016-09-21 04:13:56 -07:00
Ivan Shvedunov
f758cb418d Fix possible panic in PodAffinityChecker 2016-09-20 15:53:13 +03:00
deads2k
16fbb47189 fix up service lister 2016-09-20 08:24:33 -04:00
Wojciech Tyczynski
33c710adf0 MapReduce-like scheduler priority functions 2016-08-31 15:16:10 +02:00
mksalawa
2749ec7555 Create PredicateFailureReason, modify scheduler predicate interface. 2016-08-09 14:01:46 +02:00
Kubernetes Submit Queue
faffbe4e18 Merge pull request #29622 from rootfs/rbd-ro
Automatic merge from submit-queue

allow a read-only rbd image mounted by multiple pods

allow pod to run read-only rbd volume 
fix #27725
2016-08-07 17:03:39 -07:00
Huamin Chen
730db45eab allow a read-only rbd image mounted by multiple pods
Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-08-07 10:32:26 -04:00
Wojciech Tyczynski
022719b323 Enable PodAffinity by default in scheduler 2016-08-02 15:06:45 +02:00
Wojciech Tyczynski
4bc410e47a Speedup pod affintiy predicate function 2016-08-02 08:01:04 +02:00
k8s-merge-robot
821ff657f9 Merge pull request #27199 from derekwaynecarr/disk_eviction
Automatic merge from submit-queue

Initial support for pod eviction based on disk

This PR adds the following:

1. node reports disk pressure condition based on configured thresholds
1. scheduler does not place pods on nodes reporting disk pressure
1. kubelet will not admit any pod when it reports disk pressure
1. kubelet ranks pods for eviction when low on disk
1. kubelet evicts greediest pod

Follow-on PRs will need to handle:

1. integrate with new image gc PR (https://github.com/kubernetes/kubernetes/pull/27199)
1. container gc policy should always run (will not be launched from eviction, tbd who does that)
  1. this means kill pod is fine for all eviction code paths since container gc will remove dead container
1. min reclaim support will just poll summary provider (derek will do follow-on)
1. need to know if imagefs is same device as rootfs from summary (derek follow-on)

/cc @vishh @kubernetes/sig-node
2016-07-28 20:18:54 -07:00
derekwaynecarr
9604b47c13 Scheduler does not place pods on nodes that have disk pressure 2016-07-28 16:01:38 -04:00
Wojciech Tyczynski
898a6444e3 Return pointer for Affinity in api helper 2016-07-28 16:57:28 +02:00
Wojciech Tyczynski
fad876b6f9 PodAffinity code refinements 2016-07-22 08:49:28 +02:00
Wojciech Tyczynski
dcb5a6d1a6 Reuse existing Resource struct instead of new resourceRequest 2016-07-19 12:21:09 +02:00
k8s-merge-robot
a049a97820 Merge pull request #28803 from lukaszo/ds
Automatic merge from submit-queue

Make Daemonset use GeneralPredicates

fixes: #21454 #22205
2016-07-18 22:12:14 -07:00
Wojciech Tyczynski
a538045d7b Cleanup and prepare for optimizing PodAffinity priority function. 2016-07-15 10:06:36 +02:00
Łukasz Oleś
528bf7af3a Make Daemonset use GeneralPredicates
fixes #21454, fixes #22205
2016-07-13 14:50:29 +02:00
Wojciech Tyczynski
c929d95884 Cache Allocatable Resources 2016-07-13 12:57:18 +02:00