Commit Graph

963 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
4ba2b625c5 Merge pull request #50805 from bsalamat/preemption_metacompute
Automatic merge from submit-queue

Add support to modify precomputed predicate metadata upon adding/removal of a pod

**What this PR does / why we need it**: This PR adds capability to change precomputed predicate metadata and let's us add/remove pods to the precomputed metadata efficiently without the need ot recomputing everything upon addition/removal of pods. This PR is needed as a part of adding preemption logic to the scheduler.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
To make the review process a bit easier, there are three commits. The cleanup commit is only moving code and renaming some functions, without logic changes.

**Release note**:

```release-note
NONE
```
ref/ #47604
ref/ #48646

/assign @wojtek-t 

@kubernetes/sig-scheduling-pr-reviews @davidopp
2017-08-28 05:11:19 -07:00
Kubernetes Submit Queue
daf591c193 Merge pull request #51117 from k82cn/k8s_50360_2
Automatic merge from submit-queue

Moved node condition filter into a predicates.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50360

**Release note**:

```release-note
A new predicates, named 'CheckNodeCondition', was added to replace node condition filter. 'NetworkUnavailable', 'OutOfDisk' and 'NotReady' maybe reported as a reason when failed to schedule pods.
```
2017-08-28 01:22:27 -07:00
Bobby (Babak) Salamat
87d406569d bazel update 2017-08-28 00:12:46 -07:00
Bobby (Babak) Salamat
264ca7d158 Add support to recompute partial predicate metadata upon adding/removing pods 2017-08-28 00:12:46 -07:00
Klaus Ma
717cee04df Refres equal cache if node condition changed. 2017-08-26 11:03:57 +08:00
Kubernetes Submit Queue
e923f2ba5c Merge pull request #50819 from NickrenREN/remove-overlay-scheduler
Automatic merge from submit-queue (batch tested with PRs 51235, 50819, 51274, 50972, 50504)

Changing scheduling part to manage one single local storage resource

**What this PR does / why we need it**:
 Finally decided to manage a single local storage resource

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:  part of #50818

**Special notes for your reviewer**:
Since finally decided to manage a single local storage resource, remove overlay related code in scheduling part and change the name scratch to ephemeral storage.

**Release note**:
```release-note
Changing scheduling part of the alpha feature 'LocalStorageCapacityIsolation' to manage one single local ephemeral storage resource
```

/assign @jingxu97 
cc @ddysher
2017-08-25 19:40:29 -07:00
Klaus Ma
18dc690c7c Moved node condition filter into a predicates. 2017-08-26 09:08:07 +08:00
Kubernetes Submit Queue
2e516943dc Merge pull request #50669 from jiulongzaitian/myfeature
Automatic merge from submit-queue (batch tested with PRs 51224, 51191, 51158, 50669, 51222)

Modify the initialization of results in generic_scheduler.go

Signed-off-by: zhangjie <zhangjie0619@yeah.net>



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-08-24 22:59:35 -07:00
NickrenREN
c95ecbc7e8 Local storage does not manage overlay any more 2017-08-25 09:56:36 +08:00
Kubernetes Submit Queue
178a5ff314 Merge pull request #50665 from xiangpengzhao/hardcode-to-const
Automatic merge from submit-queue (batch tested with PRs 50257, 50247, 50665, 50554, 51077)

Replace hard-code "cpu" and "memory" to consts

**What this PR does / why we need it**:
There are many places using hard coded "cpu" and "memory" as resource name. This PR replace them to consts.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
/kind cleanup

**Release note**:

```release-note
NONE
```
2017-08-23 02:35:09 -07:00
Kubernetes Submit Queue
cb8ade18c6 Merge pull request #50950 from k82cn/revert_50360
Automatic merge from submit-queue

Revert #50362.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #50884

**Release note**:

```release-note
None
```
2017-08-21 16:50:53 -07:00
Kubernetes Submit Queue
0f8eaa45dd Merge pull request #49976 from aveshagarwal/master-pod-affinities-topology-key
Automatic merge from submit-queue (batch tested with PRs 50531, 50853, 49976, 50939, 50607)

Do not allow empty topology key for pod affinities.

**What this PR does / why we need it**:
This PR do not allow empty topology key for all 4 pod affinities.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Affinity in annotations alpha feature is no longer supported in 1.8. Anyone upgrading from 1.7 with AffinityInAnnotation feature enabled must ensure pods (specifically with pod anti-affinity PreferredDuringSchedulingIgnoredDuringExecution) with empty TopologyKey fields must be removed before upgrading to 1.8.
```
@kubernetes/sig-scheduling-bugs  @bsalamat @davidopp
2017-08-21 15:46:20 -07:00
Klaus Ma
df3a699069 Revert #50362. 2017-08-19 10:24:50 +08:00
Kubernetes Submit Queue
acd5f22398 Merge pull request #50581 from k82cn/k8s_50360_1
Automatic merge from submit-queue (batch tested with PRs 49342, 50581, 50777)

Update RegisterMandatoryFitPredicate to avoid double register.

**What this PR does / why we need it**:
In https://github.com/kubernetes/kubernetes/pull/50362 , we introduced `RegisterMandatoryFitPredicate` to make some predicates always included by scheduler. This PRs is to improve it by avoiding double register: `RegisterFitPredicate` and `RegisterMandatoryFitPredicate` 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50360 

**Release note**:

```release-note
None
```
2017-08-16 23:08:15 -07:00
Klaus Ma
051dfb1ba2 address review comments. 2017-08-17 08:22:11 +08:00
Connor Doyle
630af5422b OIR predicate includes namespaced resources. 2017-08-16 15:29:24 -07:00
xiangpengzhao
1c4dbcf5ca Replace hard-code "cpu" and "memory" to consts 2017-08-16 16:37:50 +08:00
Klaus Ma
2da96fc458 Replaced bool map to string set. 2017-08-16 14:57:12 +08:00
Klaus Ma
4a32bde4a5 Update RegisterMandatoryFitPredicate to avoid double register. 2017-08-15 21:03:14 +08:00
zhangjie
c87d42763d Modify the initialization of results in generic_scheduler.go
Signed-off-by: zhangjie <zhangjie0619@yeah.net>
2017-08-15 16:14:44 +08:00
Kevin
f76ca1fb16 update clientset.Core() to clientset.CoreV1() in test 2017-08-14 16:53:55 +08:00
Kubernetes Submit Queue
2820b45caa Merge pull request #50362 from k82cn/k8s_50360
Automatic merge from submit-queue

Moved node condition filter into a predicates.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50360 

**Release note**:
```release-note
A new predicates, named 'CheckNodeCondition', was added to replace node condition filter. 'NetworkUnavailable', 'OutOfDisk' and 'NotReady' maybe reported as a reason when failed to schedule pods.
```
2017-08-12 15:10:31 -07:00
Klaus Ma
78e078390f Renamed to RegisterMandatoryFitPredicate. 2017-08-12 07:28:40 +08:00
Jeff Grafton
a7f49c906d Use buildozer to delete licenses() rules except under third_party/ 2017-08-11 09:32:39 -07:00
Jeff Grafton
33276f06be Use buildozer to remove deprecated automanaged tags 2017-08-11 09:31:50 -07:00
Klaus Ma
e9738c0ce6 Moved node condition filter into a predicates. 2017-08-11 16:43:33 +08:00
Kubernetes Submit Queue
bc7ccfe93b Merge pull request #50106 from julia-stripe/improve-scheduler-error-handling
Automatic merge from submit-queue

Retry scheduling pods after errors more consistently in scheduler

**What this PR does / why we need it**:

This fixes 2 places in the scheduler where pods can get stuck in Pending forever.  In both these places, errors happen and `sched.config.Error` is not called afterwards. This is a problem because `sched.config.Error` is responsible for requeuing pods to retry scheduling when there are issues (see [here](2540b333b2/plugin/pkg/scheduler/factory/factory.go (L958))), so if we don't call `sched.config.Error` then the pod will never get scheduled (unless the scheduler is restarted).

One of these (where it returns when `ForgetPod` fails instead of continuing and reporting an error) is a regression from [this refactor](https://github.com/kubernetes/kubernetes/commit/ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19L234), and with the [old behavior](80f26fa8a8/plugin/pkg/scheduler/scheduler.go (L233-L237)) the error was reported correctly. As far as I can tell changing the error handling in that refactor wasn't intentional.

When AssumePod fails there's never been an error reported but I think adding this will help the scheduler recover when something goes wrong instead of letting pods possibly never get scheduled.

This will help prevent issues like https://github.com/kubernetes/kubernetes/issues/49314 in the future.

**Release note**:

```release-note
Fix incorrect retry logic in scheduler
```
2017-08-07 01:35:17 -07:00
Kubernetes Submit Queue
fa5877de18 Merge pull request #47408 from shiywang/follow-go-code-style
Automatic merge from submit-queue (batch tested with PRs 47416, 47408, 49697, 49860, 50162)

follow our go code style: error->err

Fixes https://github.com/kubernetes/kubernetes/issues/50189
```release-note
NONE
```
2017-08-05 03:22:54 -07:00
Kubernetes Submit Queue
90a45b2df3 Merge pull request #49547 from k82cn/k8s_42001_0
Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)

Task 0: Added node taints labels and feature flags

**What this PR does / why we need it**:
Added node taint const for node condition.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of #42001

**Release note**:
```release-note
None
```
2017-08-04 14:29:42 -07:00
Julia Evans
2d9c6dfae8 Handle errors more consistently in scheduler 2017-08-04 12:00:22 -07:00
Kubernetes Submit Queue
898b1b3330 Merge pull request #50028 from julia-stripe/fix-incorrect-scheduler-bind-call
Automatic merge from submit-queue

Fix incorrect call to 'bind' in scheduler

I previously submitted https://github.com/kubernetes/kubernetes/pull/49661 -- I'm not sure if that PR is too big or what, but this is an attempt at a smaller PR that makes progress on the same issue and is easier to review.

**What this PR does / why we need it**:

In this refactor (https://github.com/kubernetes/kubernetes/commit/ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19R223) the scheduler code was refactored into separate `bind` and `assume` functions. When that happened, `bind` was called with `pod` as an argument. The argument to `bind` should be the assumed pod, not the original pod. Evidence that `assumedPod` is the correct argument bind and not `pod`: 80f26fa8a8/plugin/pkg/scheduler/scheduler.go (L229-L234). (and it says `assumed` in the function signature for `bind`, even though it's not called with the assumed pod as an argument).

This is an issue (and causes #49314, where pods that fail to bind to a node get stuck indefinitely) in the following scenario:

1. The pod fails to bind to the node
2. `bind` calls `ForgetPod` with the `pod` argument
3. since `ForgetPod` is expecting the assumed pod as an argument (because that's what's in the scheduler cache), it fails with an error like `scheduler cache ForgetPod failed: pod test-677550-rc-edit-namespace/nginx-jvn09 state was assumed on a different node`
4. The pod gets lost forever because of some incomplete error handling (which I haven't addressed here in the interest of making a simpler PR)

In this PR I've fixed the call to `bind` and modified the tests to make sure that `ForgetPod` gets called with the correct argument (the assumed pod) when binding fails.

**Which issue this PR fixes**: fixes #49314

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-08-04 10:33:10 -07:00
Julia Evans
d584bf4d50 Fix incorrect call to 'bind' in scheduler 2017-08-03 13:55:00 -07:00
Harry Zhang
f8309d7598 Update generated files 2017-08-03 23:03:52 +08:00
Harry Zhang
a0787358b5 Cover get equivalence cache in core
Fix testing method
2017-08-03 23:03:52 +08:00
Klaus Ma
c8ecd92269 Moved node condition check into Predicats. 2017-08-03 15:39:11 +08:00
Avesh Agarwal
0dad8dd459 Do not allow empty topology key for pod affinities. 2017-08-02 09:41:29 -04:00
supereagle
a1c880ece3 update generated deepcopy code 2017-07-31 22:33:00 +08:00
Klaus Ma
ec4aa192cc Added taints node by condition feature flag. 2017-07-31 19:30:34 +08:00
Kubernetes Submit Queue
9350afd772 Merge pull request #48976 from supereagle/cleanup-api-package
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)

Remove duplicated import and wrong alias name of api package

**What this PR does / why we need it**:

**Which issue this PR fixes**: fixes #48975

**Special notes for your reviewer**:
/assign @caesarxuchao

**Release note**:
```release-note
NONE
```
2017-07-25 12:14:38 -07:00
vikaschoudhary16
df4f4d341b Enhance scheduler cache unit tests to cover OIR in pod spec
Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>
2017-07-25 06:35:23 -04:00
Klaus Ma
c85e4dc1de Added node taints labels. 2017-07-25 15:21:51 +08:00
Kubernetes Submit Queue
e623fed778 Merge pull request #48636 from jingxu97/July/allocatable
Automatic merge from submit-queue (batch tested with PRs 48636, 49088, 49251, 49417, 49494)

Fix issues for local storage allocatable feature

This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.

This PR fixes issue #47809
2017-07-24 19:30:33 -07:00
supereagle
adc0eef43e remove duplicated import and wrong alias name of api package 2017-07-25 10:04:25 +08:00
Kubernetes Submit Queue
2faf7ff2bc Merge pull request #36238 from resouer/eclass-2-dev
Automatic merge from submit-queue (batch tested with PRs 48043, 48200, 49139, 36238, 49130)

Implement equivalence cache by caching and re-using predicate result

The last part of #30844, I opened a new PR instead of overwrite the old one because we changed some basic assumption by allowing invalidating equivalence cache item by individual predicate.

The idea of this PR is based on discussion in https://github.com/kubernetes/kubernetes/issues/32024

- [x]  Pods belong to same controllerRef considered to be equivalent
- [x] ` podFitsOnNode` will use cached predicate result if it's available
- [x] Equivalence cache will be updated when if a fresh new predicate is done
- [x] `factory.go` will invalid specific predicate cache(s) based on the object change
- [x] Since `schedule` and `bind` are async, we need to optimistically invalid affected cache(s) before `bind`
- [x] Fully unit test of affected files
- [x] e2e test to verify cache update/invalid workflow
- [x] performance test results

- [x] Some nits fixes related but expected to result in `needs-rebase` so they are split to: #36060 #35968 #37512

cc @wojtek-t @davidopp
2017-07-19 01:57:32 -07:00
Kubernetes Submit Queue
2492477f0d Merge pull request #49110 from xiangpengzhao/remove-annotation-affinity
Automatic merge from submit-queue (batch tested with PRs 49055, 49128, 49132, 49134, 49110)

Remove affinity annotations leftover

**What this PR does / why we need it**:
This is a further cleanup for affinity annotations, following #47869.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
ref: #47869

**Special notes for your reviewer**:
- I remove the commented test cases and just leave TODOs instead. I think converting these untestable test cases for now is not necessary. We can add new test cases in future.
- I remove the e2e test case `validates that embedding the JSON PodAffinity and PodAntiAffinity setting as a string in the annotation value work` because we have a test case `validates that InterPod Affinity and AntiAffinity is respected if matching` to test the same thing.

/cc @aveshagarwal @bsalamat  @gyliu513 @k82cn @timothysc 

**Release note**:

```release-note
NONE
```
2017-07-18 21:54:25 -07:00
Kubernetes Submit Queue
5bbdfc6661 Merge pull request #48544 from sttts/sttts-typed-deepcopy-1.8
Automatic merge from submit-queue (batch tested with PRs 46094, 48544, 48807, 49102, 44174)

Static deepcopy – phase 1

This PR is the follow-up of https://github.com/kubernetes/kubernetes/pull/36412, replacing the
dynamic reflection based deepcopy with static DeepCopy+DeepCopyInto methods on API types.

This PR **does not yet** include the code dropping the cloner from the scheme and all the
porting of the calls to scheme.Copy. This will be part of a follow-up "Phase 2" PR.

A couple of the commits will go in first:
- [x] audit: fix deepcopy registration  https://github.com/kubernetes/kubernetes/pull/48599
- [x] apimachinery+apiserver: separate test types in their own packages #48601 
- [x] client-go: remove TPR example #48604
- [x] apimachinery: remove unneeded GetObjectKind() impls #48608 
- [x] sanity check against origin, that OpenShift's types are fine for static deepcopy https://github.com/deads2k/origin/pull/34

TODO **after** review here:
- [x] merge https://github.com/kubernetes/gengo/pull/32 and update vendoring commit
2017-07-18 11:20:51 -07:00
Harry Zhang
f817b8a6f6 Update generated bazel 2017-07-18 23:58:32 +08:00
Harry Zhang
0e8517875e Update factory.go informers to update equivalence cache
Fix tombstone

Add e2e to verify equivalence cache

Addressing nits in factory,go and e2e

Update build files
2017-07-18 23:55:01 +08:00
Kubernetes Submit Queue
686e93bbf1 Merge pull request #48333 from sakeven/master
Automatic merge from submit-queue (batch tested with PRs 48333, 48806, 49046)

use v1.ResourcePods instead of hard coding "pods"

Signed-off-by: sakeven <jc5930@sina.cn>



**What this PR does / why we need it**:

use v1.ResourcePods instead of hard coding 'pods'


**Special notes for your reviewer**:

**Release note**:

```
NONE
```
2017-07-18 06:24:58 -07:00
xiangpengzhao
d9d3396566 Remove affinity annotations leftover 2017-07-18 19:42:52 +08:00