Commit Graph

247 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
49e7d640d9 Merge pull request #35235 from foxish/node-controller-no-force-deletion
Automatic merge from submit-queue

Node controller to not force delete pods

Fixes https://github.com/kubernetes/kubernetes/issues/35145

- [x] e2e tests to test Petset, RC, Job.
- [x] Remove and cover other locations where we force-delete pods within the NodeController.

**Release note**:

<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->

``` release-note
Node controller no longer force-deletes pods from the api-server.

* For StatefulSet (previously PetSet), this change means creation of replacement pods is blocked until old pods are definitely not running (indicated either by the kubelet returning from partitioned state, or deletion of the Node object, or deletion of the instance in the cloud provider, or force deletion of the pod from the api-server). This has the desirable outcome of "fencing" to prevent "split brain" scenarios.
* For all other existing controllers except StatefulSet , this has no effect on the ability of the controller to replace pods because the controllers do not reuse pod names (they use generate-name).
* User-written controllers that reuse names of pod objects should evaluate this change.
```
2016-11-01 20:08:57 -07:00
Anirudh
71941016c1 Fix old e2e tests, refactor and add new e2e tests. 2016-11-01 11:46:13 -07:00
Anirudh
5ccd7a325e Removing force deletion of pods from the node-controller 2016-11-01 11:44:34 -07:00
gmarek
8d766462e7 Initialize CIDR allocator before registering handle functions 2016-10-31 16:21:37 +01:00
Chao Xu
850729bfaf include multiple versions in clientset
update client-gen to use the term "internalversion" rather than "unversioned";
leave internal one unqualified;
cleanup client-gen
2016-10-29 13:30:47 -07:00
Anirudh
05365d7cb2 Moving deletion behavior from the NC into PodGC
This should be a NOP because we're just moving functionality
around and thanks to #35476, the podGC controller should always
run anyway.
2016-10-27 11:56:15 -07:00
Mike Danese
3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
Jan Chaloupka
6079053407 Update clientset generator to use RESTClient interface instead of the RESTClient data type 2016-10-21 10:13:51 +02:00
Kubernetes Submit Queue
4dbd657e67 Merge pull request #34342 from justinsb/fix_typo_initializing
Automatic merge from submit-queue

Fix typo: initilizing -> initializing
2016-10-15 09:31:53 -07:00
Andy Goldstein
e7befa2a14 Only wait for cache syncs once in NodeController 2016-10-14 15:50:05 -04:00
gmarek
e2b78ddadc NodeController waits for informer sync before doing anything 2016-10-14 15:52:57 +02:00
David Oppenheimer
cd5779a2b1 Make NodeController recognize deletion tombstones. 2016-10-13 23:36:22 -07:00
Kubernetes Submit Queue
06c1f2ba2c Merge pull request #34707 from gmarek/master
Automatic merge from submit-queue

Bump log level in case of Node eviction
2016-10-13 05:37:10 -07:00
gmarek
41278b4c6b Bump log level in case of Node eviction 2016-10-13 13:26:16 +02:00
gmarek
8b7e9d303c Handle DeletedFinalStateUnknown in NodeController 2016-10-13 11:31:04 +02:00
Justin Santa Barbara
dc2a9d6f5c Fix typo: initilizing -> initializing 2016-10-07 13:29:22 -04:00
deads2k
0961784a9b switch node controller to shared informers 2016-09-29 09:16:41 -04:00
gmarek
cb0a13c1e5 Move orphaned Pod deletion logic to PodGC 2016-09-28 13:58:31 +02:00
Justin Santa Barbara
54195d590f Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.

To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.

A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName

Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
2016-09-27 10:47:31 -04:00
Kubernetes Submit Queue
6e25117891 Merge pull request #32655 from dshulyak/fix_node_fake_update
Automatic merge from submit-queue

Fix FakeNodeHandler Update behaviour

Two problems:
1. Get is always using Existing nodes slice, and you will for sure miss any updated data
2. Each Update adds a duplicate node entry to UpdatedNodes slice

For the 1st, we will try to find a node in UpdatedNodes slice (same as for the List).
2nd - append only if there is no node with same name as updated, if there is we will replace object in UpdatedNodes slice.
2016-09-22 07:43:18 -07:00
Mike Danese
a765d59932 move informer and controller to pkg/client/cache
Signed-off-by: Mike Danese <mikedanese@google.com>
2016-09-15 12:50:08 -07:00
Dmitry Shulyak
0ddaa20bf1 Fix FakeNodeHandler Update behaviour
Two problems:
1. Get is always using Existing nodes slice, and you will for sure miss any
   updated data
2. Each Update duplicates node entry in UpdatedNodes slice

For the 1st, try to find a node in UpdatedNodes slice (same as for the List).
2nd - append only if there is no node with same name as updated, if there is
just replace object.

Change-Id: I9ef1cca2788ba946eee37fa1b037c124ad76074c
2016-09-14 12:34:37 +03:00
gmarek
c40a36cab0 Change the eviction metric type and fix rate-limited-timed-queue 2016-09-08 12:20:51 +02:00
Kubernetes Submit Queue
9dfe8df7cd Merge pull request #32107 from xingzhou/km_bug
Automatic merge from submit-queue

Used goroutine to launch node controller's internalPodInformer.

Fixes #32103
2016-09-06 11:50:28 -07:00
Kubernetes Submit Queue
afef4b6938 Merge pull request #32070 from gmarek/nodecontroller
Automatic merge from submit-queue

Sleep between NodeStatus update retries

Just a thing I found when looking into other problems.

This is pretty much no-risk change fixing wrong behavior. Do you think it should go in 1.4? @pwittrock
2016-09-06 03:51:20 -07:00
Xing Zhou
46c302b4c2 Used goroutine to launch node controller's internalPodInformer.
Node controller's internalPodInformer will block main thread
if it is not started as a go routine. This patch fixed this
by runing internalPodInformer as a go routine.
2016-09-06 15:39:13 +08:00
Wojciech Tyczynski
b69d516763 NodeController listing nodes from apiserver cache 2016-09-05 16:52:57 +02:00
gmarek
bac603afd6 Sleep between NodeStatus update retries 2016-09-05 12:29:28 +02:00
gmarek
ea2d19f5d7 Remove unused argument to NodeController.Run 2016-08-30 14:24:56 +02:00
Kubernetes Submit Queue
6a1c63fd37 Merge pull request #30857 from better0332/master
Automatic merge from submit-queue

fix FakeNodeHandler List()
2016-08-22 17:40:34 -07:00
Kubernetes Submit Queue
9ebaf29295 Merge pull request #30624 from derekwaynecarr/node-controller-fix
Automatic merge from submit-queue

Node controller deletePod return true if there are pods pending deletion

Fixes https://github.com/kubernetes/kubernetes/issues/30536

If a node had a single pod in terminating state, and that node no longer reported healthy, the pod was never deleted by the node controller because it believed there were no pods remaining.

@smarterclayton @ncdc
2016-08-19 14:33:54 -07:00
gmarek
5d8cb17efa Add cluster health metrics to NodeController 2016-08-18 15:11:10 +02:00
better0332
2f837e7096 fix FakeNodeHandler List() 2016-08-18 17:30:26 +08:00
Kubernetes Submit Queue
f9190ed61a Merge pull request #30138 from gmarek/flags
Automatic merge from submit-queue

Expose flags for new NodeEviction logic in NodeController

Fix #28832
Last PR from the NodeController NodeEviction logic series. 

cc @davidopp @lavalamp @mml
2016-08-18 00:41:28 -07:00
bprashanth
15c9826061 Nodecontroller doesn't flip readiness on pods if kubeletVersion < 1.2.0 2016-08-17 15:33:35 -07:00
gmarek
4cf698ef04 Expose flags for new NodeEviction logic in NodeController 2016-08-17 10:43:24 +02:00
derekwaynecarr
14a2b261a8 Node controller deletePod return true if there are pods pending deletion 2016-08-16 13:05:38 -04:00
AdoHe
b2ab4c6d9b fix node controller event uid issue 2016-08-14 09:41:20 +08:00
Kubernetes Submit Queue
2537f66f0e Merge pull request #29230 from luxas/goimport
Automatic merge from submit-queue

Run goimport for the whole repo

While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`

This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
2016-08-05 16:22:01 -07:00
Dominika Hodovska
816f6d32ca Collapse duplicate informer creation paths 2016-08-04 09:02:13 +02:00
gmarek
66224ce0bd Change eviction logic in NodeController and make it Zone-aware 2016-08-02 14:21:52 +02:00
Lucas Käldström
c88a07ce1a Run goimports 2016-08-02 15:12:39 +03:00
k8s-merge-robot
59836d6dbd Merge pull request #24841 from sjenning/shared-informer
Automatic merge from submit-queue

update node controller to use shared pod informer

continuing work from #24470 and #23575
2016-08-02 03:45:01 -07:00
Matt T. Proud
76aab29ede pkg/controller/node/nodecontroller: simplify mutex
Similar to #29598, we can rely on the zero-value construction behavior
to embed `sync.Mutex` into parent structs.
2016-07-26 07:06:16 +02:00
Seth Jennings
db6026c82a node controller use shared pod informer 2016-07-20 15:26:19 -05:00
Seth Jennings
6d77f53af4 refactor maybeDeleteTerminatingPod 2016-07-20 15:26:19 -05:00
k8s-merge-robot
8c6336b12a Merge pull request #29101 from gmarek/allocator2
Automatic merge from submit-queue

Retry assigning CIDRs

Fix #28879, ref #29064

cc @bgrant0607 @bprashanth @alex-mohr
2016-07-19 04:35:23 -07:00
gmarek
56006fac43 Retry assigning CIDRs 2016-07-18 17:06:04 +02:00
k8s-merge-robot
fa174bcdaf Merge pull request #29042 from dims/fixup-imports
Automatic merge from submit-queue

Use Go canonical import paths

Add canonical imports only in existing doc.go files.
https://golang.org/doc/go1.4#canonicalimports

Fixes #29014
2016-07-18 07:23:38 -07:00
gmarek
579912d9d2 Re-check assigned CIDR during update 2016-07-18 10:57:58 +02:00
Prashanth Balasubramanian
2f9516db30 List all nodes and occupy cidr map before starting allocations 2016-07-16 13:54:01 -07:00
Davanum Srinivas
2b0ed014b7 Use Go canonical import paths
Add canonical imports only in existing doc.go files.
https://golang.org/doc/go1.4#canonicalimports

Fixes #29014
2016-07-16 13:48:21 -04:00
gmarek
f6b1c316e9 Allow switching rate limiter inside RateLimitedQueue 2016-07-14 15:38:14 +02:00
k8s-merge-robot
6b6141f812 Merge pull request #28820 from caesarxuchao/patch-subresource
Automatic merge from submit-queue

[client-gen] Allow passing subresources in Patch method

Expand the Patch() method from:
```
Patch(name string, pt api.PatchType, data []byte)
```
to
```
Patch(name string, pt api.PatchType, data []byte, subresources ...string)
```

Continue on #27293. Fixes #26580.

cc @Random-Liu @lavalamp
2016-07-13 16:09:01 -07:00
gmarek
5677a9845e Split NodeController rate limiters between zones 2016-07-13 14:09:19 +02:00
Chao Xu
dc2e12d2f8 manual changes to patch subresource 2016-07-12 11:09:27 -07:00
gmarek
fd600ab65c Add hooks for cluster health detection 2016-07-12 15:10:58 +02:00
gmarek
7524da877e Reduce tightness of coupling in NodeController 2016-07-12 11:00:41 +02:00
gmarek
7f5f9d3a6f Move CIDR allocation logic away from nodecontroller.go 2016-07-12 09:40:43 +02:00
David McMahon
ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
Oleg Shaldybin
a58b4cf59d Don't panic in NodeController if pod update fails
Previously it was trying to use a nil pod variable if error was returned
from the pod update call.
2016-06-28 11:54:13 -07:00
Oleg Shaldybin
3b15d5be19 Use correct namespace in unit tests that use fake clientset
Fake clientset no longer needs to be prepopulated with records: keeping
them in leads to the name conflict on creates. Also, since fake
clientset now respects namespaces, we need to correctly populate them.
2016-06-28 11:26:34 -07:00
k8s-merge-robot
93037844c1 Merge pull request #27293 from caesarxuchao/add-patch-to-clientset
Automatic merge from submit-queue

[client-gen]Add Patch to clientset

* add the Patch() method to the clientset. 
* I have to rename the existing Patch() method of `Event` to PatchWithEventNamespace() to avoid overriding.
* some minor changes to the fake Patch action.

cc @Random-Liu since he asked for the method
@kubernetes/sig-api-machinery 

ref #26580 

```release-note
Add the Patch method to the generated clientset.
```
2016-06-25 19:15:11 -07:00
saadali
926bb4cca0 Add patch status to Node internalclientset 2016-06-19 23:54:02 -07:00
goltermann
218645b346 Fix several spelling errors in comments. 2016-06-17 10:41:18 -07:00
Chao Xu
a29f6aa8ae add Patch to clientsets 2016-06-17 10:30:58 -07:00
gmarek
7cac170214 AllocateOrOccupyCIDR returs quickly 2016-05-31 09:11:42 +02:00
k8s-merge-robot
577cdf937d Merge pull request #26415 from wojtek-t/network_not_ready
Automatic merge from submit-queue

Add a NodeCondition "NetworkUnavaiable" to prevent scheduling onto a node until the routes have been created 

This is new version of #26267 (based on top of that one).

The new workflow is:
- we have an "NetworkNotReady" condition
- Kubelet when it creates a node, it sets it to "true"
- RouteController will set it to "false" when the route is created
- Scheduler is scheduling only on nodes that doesn't have "NetworkNotReady ==true" condition

@gmarek @bgrant0607 @zmerlynn @cjcullen @derekwaynecarr @danwinship @dcbw @lavalamp @vishh
2016-05-29 03:06:59 -07:00
Alex Robinson
d577550dd0 Merge pull request #26054 from gmarek/flags
Make service-range flag in controller-manager optional
2016-05-27 14:26:15 -07:00
gmarek
7bdf480340 Node is NotReady until the Route is created 2016-05-27 19:29:51 +02:00
Alex Robinson
7522389d8d Merge pull request #26207 from zmerlynn/fix-unneeded-updated
nodecontroller: Fix log message on successful update
2016-05-27 09:56:28 -07:00
Zach Loafman
cb69960742 nodecontroller: Fix log message on successful update 2016-05-25 14:44:15 -07:00
k8s-merge-robot
4e8e4a574c Merge pull request #25636 from zhouhaibing089/delnode-fix
Automatic merge from submit-queue

use monotonic now in TestDelNode

Fixes https://github.com/kubernetes/kubernetes/issues/24971.

Briefly, the rate_limited_queue uses a `container/heap` to store values, and use this data structure to ensure we can always fetch the value with the minimum `processAt`. However, in some extreme condition, the continuous call to `time.Now()` would get the same value, which causes some unpredictable order in the queue, this fix uses a monotonic `now()` to avoid that.

@smarterclayton please take a look.
2016-05-25 13:33:31 -07:00
gmarek
08385b2c5f Make service-range flag in controller-manager optional 2016-05-23 09:37:53 +02:00
gmarek
1d89d2f2d2 Add few log lines to NodeController 2016-05-23 08:49:11 +02:00
mqliang
17d5a302bb make podcidr mask size configurable 2016-05-20 20:44:40 +08:00
mqliang
cf7a3475f3 Don't allow node controller to allocate into service CIDR range 2016-05-20 20:44:40 +08:00
mqliang
69b8453fa0 cidr allocator 2016-05-20 20:44:40 +08:00
gmarek
6d27009db1 NodeController doesn't evict Pods if no Nodes are Ready 2016-05-17 23:03:21 +02:00
zhouhaibing089
af3e2357ea use monotonic now in TestDelNode 2016-05-15 10:58:29 +08:00
mqliang
c10f43a2e5 implement AddIndexers for SharedIndexInformer 2016-05-06 21:23:18 +08:00
mqliang
9011207f18 add namespace index to rc and pod 2016-05-06 17:12:36 +08:00
k8s-merge-robot
4a0e0826e5 Merge pull request #24220 from gmarek/metrics
Automatic merge from submit-queue

Generated clients can return their RESTClients, RESTClient can return its RateLimiter

cc @lavalamp @krousey @wojtek-t @smarterclayton @timothysc 

Ref. #22421
2016-04-27 19:25:38 -07:00
gmarek
3171aac57c Generated clients can return their RESTClients, RESTClient can return its RateLimiter 2016-04-27 22:15:10 +02:00
Clayton Coleman
08f136b8d9
RateLimitedQueue TestTryOrdering could fail under load
Remove the possibility of contention in the test by providing a
synthetic Now() function.
2016-04-24 20:03:04 -04:00
k8s-merge-robot
72e51dacfe Merge pull request #24034 from AdoHe/log_spam
Automatic merge from submit-queue

remove log spam from nodecontroller

@thockin @quinton-hoole ptal.
2016-04-21 12:11:05 -07:00
Chao Xu
8537095415 use fully qualified resource in fake clients actions 2016-04-20 19:44:40 -07:00
AdoHe
e52f71f78d remove log spam from nodecontroller 2016-04-10 22:51:29 -04:00
k8s-merge-robot
7d7ca5ab72 Merge pull request #23608 from caesarxuchao/mv-typed-clients
Automatic merge from submit-queue

Move typed clients into clientset folder

Move typed clients from `pkg/client/typed/` to `pkg/client/clientset_generated/${clientset_name}/typed`.

The first commit changes the client-gen, the last commit updates the doc, other commits are just moving things around.

@lavalamp @krousey
2016-04-02 19:31:40 -07:00
Chao Xu
49559a3332 Generate the typed clients under the clientset folder 2016-03-31 15:28:45 -07:00
André Cruz
34a273f1eb Fixed typo. 2016-03-29 16:10:11 +01:00
k8s-merge-robot
e44ad7a083 Merge pull request #22735 from resouer/throttle-dev
Auto commit by PR queue bot
2016-03-26 06:44:48 -07:00
goltermann
32d569d6c7 Fixing all the "composite literal uses unkeyed fields" Vet errors. 2016-03-25 15:25:09 -07:00
harry
8472cfa214 Refactor throttle into util pkg
Fix missing throttle.go
2016-03-25 08:32:23 +08:00
harry
b0900bf0d4 Refactor diff into sub pkg 2016-03-21 20:21:39 +08:00
Mike Danese
e0431d8409 Revert "Revert "continuously delete pods on nodes that don't exist""
This reverts commit da0a72f2c2.
2016-03-08 11:00:35 -08:00
Marek Grabowski
da0a72f2c2 Revert "continuously delete pods on nodes that don't exist" 2016-03-08 09:38:53 +01:00
Mike Danese
c404e7c6d1 continuously delete pods on nodes that don't exist 2016-03-07 16:01:33 -08:00
k8s-merge-robot
dc46ae031d Merge pull request #22336 from cjcullen/evict
Auto commit by PR queue bot
2016-03-07 08:42:40 -08:00
CJ Cullen
e7fc608df7 Immediately evict pods and delete node when cloud says node is gone. 2016-03-07 06:07:51 -08:00