Commit Graph

1193 Commits

Author SHA1 Message Date
Ke Zhang
c8471f2c3e EndpointController syncService log error 2016-06-17 17:05:50 +08:00
k8s-merge-robot
646a872f15 Merge pull request #27415 from caesarxuchao/fix-oldrc
Automatic merge from submit-queue

fix updatePod() of RS and RC controllers

Fix updatePod of replication controller manager and replica set controller to handle pod label updates that match no RC or RS.

Fix #27405
2016-06-16 17:09:53 -07:00
Chao Xu
63fb075f0a fix updatePod of replication controller manager and replica set controller to
handle pod label updates that match no rc or rs
2016-06-15 10:34:26 -07:00
saadali
542f2dc708 Introduce new kubelet volume manager
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).

This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
2016-06-15 09:34:08 -07:00
saadali
9b6a505f8a Rename UniqueDeviceName to UniqueVolumeName
Rename UniqueDeviceName to UniqueVolumeName and move helper functions
from attacherdetacher to volumehelper package.
Introduce UniquePodName alias
2016-06-15 09:32:12 -07:00
Jan Safranek
449e9f49d3 Fill PV.Status.Message with deleter/recycler errors. 2016-06-15 14:56:31 +02:00
k8s-merge-robot
2b9670b77b Merge pull request #27190 from caesarxuchao/remove-debugging-log
Automatic merge from submit-queue

Fix a debugging line

A trivial update. @k8s-oncall can we manually merge it?
2016-06-14 16:53:09 -07:00
Wojciech Tyczynski
5d702a32c1 Fix race in informer 2016-06-14 16:40:12 +02:00
k8s-merge-robot
f97bca37a5 Merge pull request #27127 from jsafrane/refactor-binder-operations
Automatic merge from submit-queue

Rework PV controller to use util/goroutinemap


@kubernetes/sig-storage
2016-06-12 23:44:28 -07:00
k8s-merge-robot
628af356b8 Merge pull request #26980 from hongchaodeng/fix
Automatic merge from submit-queue

processor listener: fix locking in pop()

Currently the lock in processorListener is used to guard pendingNotifications. But in pop, it also locks around on select chan. This will block the goroutine with lock acquired.

This PR changes the lock to guard the correct section only.
2016-06-12 17:59:09 -07:00
Chao Xu
c15c10f312 fix a log line 2016-06-10 09:58:27 -07:00
Janet Kuo
764df2e096 Listing pods only once when getting pods for RS in deployment 2016-06-10 09:55:28 -07:00
Jan Safranek
6081bd61f0 Rework PV controller to use util/goroutinemap 2016-06-09 13:49:04 +02:00
Hongchao Deng
d4eb48c0bb add TestPopReleaseLock 2016-06-08 11:34:35 -07:00
Hongchao Deng
308201acb0 processor listener: fix locking in pop() 2016-06-08 11:34:35 -07:00
Chao Xu
91de14cf13 rename the gc for terminated pods to "podgc" 2016-06-07 22:10:34 -07:00
Michail Kargakis
886014b1a3 kubectl: fix sort logic for logs
Use a separate sorting algorithm for kubectl logs that sorts from older
to newer instead of the other way that ActivePods is doing.
2016-06-07 10:52:42 +02:00
Xiang Li
9a1779110c daemon/controller.go: refactor worker 2016-06-05 23:29:57 -07:00
k8s-merge-robot
707cc2bbb8 Merge pull request #26493 from caesarxuchao/fix-gc-flake
Automatic merge from submit-queue

Fixes 25890 flake. Let GC convert ListOptions to v1 before passing it to the dynamic client

GC's ListWatcher directly passed the api.ListOptions to the dynamic client, but the parameter codec of dynamic client converts the options to queries based on the tags in the struct, which are not present in api.ListOptions, so the queries are not sent to the server. As a result, the Watch request was sent without a resourceVersion, causing missed events. Flake #25890 is caused by the missed deletion events.

This PR converts the api.ListOptions to v1.ListOptions before the GC passes it to the dynamic codec. The flaky test has successfully passed 79 times ([log](https://00e9e64bacd064560a027fbee9c5a373a1614f3a56e652ae40-apidata.googleusercontent.com/download/storage/v1_internal/b/kubernetes-jenkins/o/pr-logs%2Fpull%2F25923%2Fkubernetes-pull-test-unit-integration%2F28364%2Fbuild-log.txt?qk=AD5uMEv72OjSUqDyk5i-ZLurcmM4i7gket1c7WaqR7yuIYz7WhPYT7ewVBafijV0ymnPTYqxRYt1kp6S9YQv7chPwC-3UtrKetKfhYnvAFrPGXAIBxHytTmpFohRAYgsARN1B6j1f9vyK5lM-8jyzRGhCK3sCRsAPnbDBWIWFlbH4b1n3vUET3P71QamHrF5itYyaqRU5pMZV3Cwwr81X8q7h5hCzm3Ip78RpMzfjEqTG0RcM2TLGccUrlkWVBLh4hn0NFpUIkzVFugFA5ooJffo-0AdJnO3mGWEOnXNVFWftJbK8cKnTns0DISrYFOyH_PlOe_YHCxgIXIT-dW8G-nbqoUjn5SBqunr36rcpaYCIwe2va4W_AcLCT43xiEAezRER_U9AuIqi_22KMd6SuHTyljhmWFPvPk8-gpjthLWXhcE7LPO5dV41hnZHnbI4n_9eI1nSVm7q9XdSvX1sWKV1GCwn8oj017AnxVvl9bScultko_0dTC747UqJ6UTFakLuFcHFe-F5Tz7ItDWlBVPoXeC7gTpyuicFKLsdqGlW9F5X6kIwNrBRj9uRsS-QuzSER-fVkQCn4dUTcokttRH_0bYvyfr9oqiDXmywMgOp-L0sKayk8JOVynh2q0Tju9sdkvFr0PxoAjhofomfIC1SZ_JkOzwAT1TUW8dLjPHluMct34xW_-qna1AmkoxM4bZQLhllap96NTC-0IdtzeKDrTul8p7u3WXSJjjEMSijibTNMlnkB0AluT1_RNO94OnzuFv4YlcV24FPhJzchhbyKREkOb_wzgcnSbRwGHjIcfRgkX-IzoXHVBcMYFUrPmsXrnRcfad4XwjkUOgvivkURW2_EwnzgrLDh-IKek51_0FpT1MnFCSG0gQbVSs_iMVPr6UXNAw62LGbKVtl3ZMXyapEpcO8azNbn6Wvd550R704JXxYlU)).

@lavalamp @krousey @smarterclayton
2016-06-04 01:52:31 -07:00
k8s-merge-robot
bd2bc25308 Merge pull request #25865 from jsafrane/devel/pv-convert-from-12
Automatic merge from submit-queue

volume controller: Convert PersistentVolumes from Kubernetes 1.2

In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:

- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.

See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
2016-06-03 23:27:13 -07:00
k8s-merge-robot
4877153727 Merge pull request #26772 from jsafrane/flake-controller-cache-empty
Automatic merge from submit-queue

Wait for all volumes/claims to get synced in unit test.

Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.

Fixes #26712
2016-06-03 17:05:22 -07:00
k8s-merge-robot
a00dbea133 Merge pull request #26758 from mqliang/lookupcache-threadsafe
Automatic merge from submit-queue

bugfix:lookupcache's Get method can not be called concurrently

ref https://github.com/kubernetes/kubernetes/issues/26376

@lavalamp @therc @mikedanese
2016-06-03 12:46:13 -07:00
Chao Xu
06f49f7ca7 Let the dynamic client take a customized parameter codec for List, Watch, and DeleteCollection.
Let the gc's ListWatcher use api.ParameterCodec. Fixes 25890.
2016-06-03 11:22:51 -07:00
mqliang
9a0ff5a9e8 bugfix:lookupcache's Get method can not be called concurrently 2016-06-04 02:21:25 +08:00
Jan Safranek
27b11c5342 Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.
2016-06-03 14:26:06 +02:00
k8s-merge-robot
3157e87cb2 Merge pull request #26768 from wojtek-t/routecontroller_logs
Automatic merge from submit-queue

Improve logging in routecontroller

@zmerlynn
2016-06-03 04:51:12 -07:00
k8s-merge-robot
59e008dbcb Merge pull request #26733 from pmorie/pv-controller-typos
Automatic merge from submit-queue

Fix typo and linewrap comments in PV controller

Fix some typos and linewrap long comments that I found while going over this code investigating something.
2016-06-03 04:51:08 -07:00
Wojciech Tyczynski
de1d35a66d Improve logging in routecontroller 2016-06-03 12:05:12 +02:00
Jan Safranek
962505ad01 Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
2016-06-03 10:53:56 +02:00
k8s-merge-robot
75ef1ca270 Merge pull request #26351 from saad-ali/attachDetachControllerKubeletChanges
Automatic merge from submit-queue

Attach/Detach Controller Kubelet Changes

This PR contains changes to enable attach/detach controller proposed in #20262.

Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
  * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9#issuecomment-221770924 opened to fix that.
  * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.

Fixes #14642, #19953

```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.

A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
2016-06-02 23:30:32 -07:00
k8s-merge-robot
a41d84408c Merge pull request #26518 from jsafrane/initial-sync
Automatic merge from submit-queue

Fill controller caches on startup

The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)
2016-06-02 21:44:56 -07:00
Saad Ali
9dbe943491 Attach/Detach Controller Kubelet Changes
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
  enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
  annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
  get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
  opened to fix that.
2016-06-02 16:47:11 -07:00
Paul Morie
277c0a4e90 Fix typo and linewrap comments in PV controller 2016-06-02 15:50:07 -04:00
Janet Kuo
36f704c975 List RSes only once when getting old+new RSes in deployment controller 2016-06-02 11:24:43 -07:00
k8s-merge-robot
335da9b125 Merge pull request #26410 from jsafrane/fix-test-race
Automatic merge from submit-queue

Fix data race in volume controller unit test.

Reactor must be locked when fiddling with reactor.volumes and reactor.claims. Therefore add new functions to add/delete volume/claim with sending an event.

Fixes #26345
2016-06-02 04:25:08 -07:00
k8s-merge-robot
745eb08e83 Merge pull request #26595 from janetkuo/log-test-e2e-deployment
Automatic merge from submit-queue

Adding logs in deployment for debugging



Ref #26509
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
2016-06-01 20:35:42 -07:00
Jan Safranek
ee74cc4354 Fix fake event recorder race
Event recorder should wait for some time to get all expected events, the event
may be written by another goroutine that just have finished.

It should not slow down the test in most cases, only when there is a bug and
expected event is not sent.
2016-06-01 10:16:35 +02:00
Jan Safranek
2d43e4549e Fix data race in volume controller unit test.
Reactor must be locked when fiddling with reactor.volumes and reactor.claims.
Therefore add new functions to add/delete volume/claim with sending an event.
2016-06-01 08:35:33 +02:00
k8s-merge-robot
04f77dd602 Merge pull request #26556 from jsafrane/fix-format
Automatic merge from submit-queue

Fix log arguments.

'i' is not printed.
@kubernetes/sig-storage
2016-05-31 21:24:50 -07:00
k8s-merge-robot
38d5be4f36 Merge pull request #26555 from jsafrane/stabilize-test-flakes
Automatic merge from submit-queue

Stabilize controller unit tests.

Remove test "5-1", it's flaky as it depends on order of execution of goroutines. When the controller starts, existing claim is enqueued as "initial sync event" and a new volume is enqueued to separate goroutine. It is not deterministic which goroutine processes its events first and there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all events are processed.

Fixes unit test flakes.
@kubernetes/sig-storage
2016-05-31 17:06:12 -07:00
Janet Kuo
310a7d2eb5 Adding logs in deployment for debugging 2016-05-31 15:59:46 -07:00
k8s-merge-robot
38181bb3fb Merge pull request #25917 from pmorie/pv-selector
Automatic merge from submit-queue

Add LabelSelector to PersistentVolumeClaimSpec

Implements #25413.

@kubernetes/sig-storage @bgrant0607 @thockin @jsafrane @eparis
2016-05-31 08:22:07 -07:00
Jan Safranek
21059e8b6d Fix log arguments.
'i' is not printed.
2016-05-31 12:12:15 +02:00
Jan Safranek
011eac7c8b Stabilize controller unit tests.
Remove test "5-1", it's flaky as it depends on order of execution of
goroutines. When the controller starts, existing claim is enqueued as
"initial sync event" and a new volume is enqueued to separate goroutine.
It is not deterministic which goroutine processes its events first and
there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all
events are processed.
2016-05-31 12:07:47 +02:00
gmarek
7cac170214 AllocateOrOccupyCIDR returs quickly 2016-05-31 09:11:42 +02:00
k8s-merge-robot
d1277e34fd Merge pull request #25913 from pweil-/ds-tombstone
Automatic merge from submit-queue

daemonset handle DeletedFinalStateUnknown

During an e2e run in OpenShift we ran into the DS controller panic when handling `DeletedFinalStateUnknown`.  This PR checks for `DeletedFinalStateUnknown` and queues the embedded object if it is a `DaemonSet`.

@mikedanese - would you mind taking a look?
@deads2k  

```
panic: interface conversion: interface is cache.DeletedFinalStateUnknown, not *extensions.DaemonSet

goroutine 4369 [running]:
k8s.io/kubernetes/pkg/controller/daemon.func·005(0x2f8a0c0, 0xc20b559680)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/daemon/controller.go:160 +0x50
k8s.io/kubernetes/pkg/controller/framework.ResourceEventHandlerFuncs.OnDelete(0xc20a0ae090, 0xc20a0ae0a0, 0xc20a0ae0b0, 0x2f8a0c0, 0xc20b559680)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/framework/controller.go:178 +0x41
k8s.io/kubernetes/pkg/controller/framework.(*ResourceEventHandlerFuncs).OnDelete(0xc20b8ebf20, 0x2f8a0c0, 0xc20b559680)
	<autogenerated>:25 +0xb5
k8s.io/kubernetes/pkg/controller/framework.func·001(0x2f8a280, 0xc20b5522e0, 0x0, 0x0)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/framework/controller.go:248 +0x4be
k8s.io/kubernetes/pkg/controller/framework.(*Controller).processLoop(0xc20bb727e0)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/framework/controller.go:122 +0x6f
k8s.io/kubernetes/pkg/controller/framework.*Controller.(k8s.io/kubernetes/pkg/controller/framework.processLoop)·fm()
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/framework/controller.go:97 +0x27
k8s.io/kubernetes/pkg/util/wait.func·001()
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/util/wait/wait.go:66 +0x61
k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc209f8cfb8, 0x3b9aca00, 0x0, 0xc2080543c0)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/util/wait/wait.go:67 +0x8f
k8s.io/kubernetes/pkg/util/wait.Until(0xc209f8cfb8, 0x3b9aca00, 0xc2080543c0)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x4a
k8s.io/kubernetes/pkg/controller/framework.(*Controller).Run(0xc20bb727e0, 0xc2080543c0)
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/framework/controller.go:97 +0x1fb
created by k8s.io/kubernetes/pkg/controller/daemon.(*DaemonSetsController).Run
	/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/daemon/controller.go:212 +0xae
```
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/1002/artifact/origin/artifacts/test-cmd/logs/openshift.log
2016-05-30 17:54:17 -07:00
Paul Morie
4ffa3c6754 Add label selector to match criteria for claims to volumes 2016-05-30 12:11:12 -04:00
Paul Morie
faa112bad1 Add selector to PersistentVolumeClaim 2016-05-30 12:09:50 -04:00
k8s-merge-robot
9aeeef1d81 Merge pull request #26414 from jsafrane/reduce-sync-period
Automatic merge from submit-queue

Reduce volume controller sync period

fixes #24236 and most probably also fixes #25294.
Needs #25881! With the cache, binder is not affected by sync period. Without the cache, binding of 1000 PVCs takes more than 5 minutes (instead of ~70 seconds).

15 seconds were chosen by fair 2d10 roll :-)
2016-05-30 05:54:51 -07:00
Jan Safranek
df161c3a7e Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/
deleting or provisioning volumes and claims. This was done using blocking
initial 'xxx added' from going through syncClaim/syncVolume. However, when
the caches were full, the controller waited for the next sync period to do
actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then
processes initial 'xxx added' events to reconcile the world and bind/recycle/
delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)
2016-05-30 13:16:45 +02:00
k8s-merge-robot
5643b7498f Merge pull request #25881 from jsafrane/devel/pv-add-cache
Automatic merge from submit-queue

volume controller: Add cache with the latest version of PVs and PVCs

When the controller binds a PV to PVC, it saves both objects to etcd. However, there is still an old version of these objects in the controller Informer cache. So, when a new PVC comes, the PV is still seen as available and may get bound to the new PVC. This will be blocked by etcd, still, it creates unnecessary traffic that slows everything down.

To make everything worse, when periodic sync with the old PVC is performed, this PVC is seen by the controller as Pending (while it's already Bound on etcd) and will be bound to a different PV. Writing to this PV won't be blocked by etcd, only subsequent write of the PVC fails. So, the controller will need to roll back the PV in another transaction(s). The controller can keep itself pretty busy this way.

Also, we save bound PVs (and PVCs) as two transactions - we save say PV.Spec first and then .Status. The controller gets "PV.Spec updated" event from etcd and tries to fix the Status, as it seems to the controller it's outdated. This write again fails - there already is a correct version in etcd.

As we can't influence the Informer cache, it is read-only to the controller, this patch introduces second cache in the controller, which holds latest and greatest version on PVs and PVCs to prevent these useless writes to etcd . It gets updated with events from etcd *and* after etcd confirms successful save of PV/PVC modified by the controller.

The cache stores only *pointers* to PVs/PVCs, so in ideal case it shares the actual object data with the informer cache. They will diverge only for a short time when the controller modifies something and the informer cache did not get update events yet.

@kubernetes/sig-storage
2016-05-30 04:13:18 -07:00
Jan Safranek
2aa9f1dd8f Reduce volume controller sync period 2016-05-30 09:59:31 +02:00
Matt Freeman
5c288a77fe Fix error handling in endpoint controller
Added missing returns, subsequent statements depend on key
2016-05-29 19:44:20 +07:00
k8s-merge-robot
577cdf937d Merge pull request #26415 from wojtek-t/network_not_ready
Automatic merge from submit-queue

Add a NodeCondition "NetworkUnavaiable" to prevent scheduling onto a node until the routes have been created 

This is new version of #26267 (based on top of that one).

The new workflow is:
- we have an "NetworkNotReady" condition
- Kubelet when it creates a node, it sets it to "true"
- RouteController will set it to "false" when the route is created
- Scheduler is scheduling only on nodes that doesn't have "NetworkNotReady ==true" condition

@gmarek @bgrant0607 @zmerlynn @cjcullen @derekwaynecarr @danwinship @dcbw @lavalamp @vishh
2016-05-29 03:06:59 -07:00
k8s-merge-robot
a550cf16b9 Merge pull request #25826 from freehan/svcsourcerange
Automatic merge from submit-queue

promote sourceRange into service spec

@thockin  one more for your pile

I will add docs at `http://releases.k8s.io/HEAD/docs/user-guide/services-firewalls.md`

cc: @justinsb 

Fixes: #20392
2016-05-28 02:20:13 -07:00
k8s-merge-robot
7fae9c14e2 Merge pull request #25662 from deads2k/prevent-hotloop
Automatic merge from submit-queue

prevent namespace cleanup hotloop

Found chasing a sentry report.  Looks like we hot-loop on namespace deletion failures.

@derekwaynecarr ptal
2016-05-28 01:30:51 -07:00
Alex Robinson
d577550dd0 Merge pull request #26054 from gmarek/flags
Make service-range flag in controller-manager optional
2016-05-27 14:26:15 -07:00
Wojciech Tyczynski
be1b57100d Change to NotReadyNetworking and use in scheduler 2016-05-27 19:32:49 +02:00
gmarek
7bdf480340 Node is NotReady until the Route is created 2016-05-27 19:29:51 +02:00
Alex Robinson
7522389d8d Merge pull request #26207 from zmerlynn/fix-unneeded-updated
nodecontroller: Fix log message on successful update
2016-05-27 09:56:28 -07:00
saadali
3c345abafd Fix DATA RACE in unit tests: reconciler_test.go 2016-05-27 01:19:25 -07:00
Alex Mohr
9803393a67 Merge pull request #25960 from jsafrane/do-not-sort-bind
volume controller: Speed up binding by not sorting volumes
2016-05-26 15:47:14 -07:00
Alex Mohr
edda837142 Merge pull request #25599 from caesarxuchao/orphaning-finalizer
Add orphaning finalizer logic to GC
2016-05-26 13:19:19 -07:00
Minhan Xia
a1bd33f510 promote sourceRange into service spec 2016-05-26 10:42:30 -07:00
Wojciech Tyczynski
aa65a7974a Spread creating routes over time and retry on failures 2016-05-26 13:00:53 +02:00
k8s-merge-robot
98766f4548 Merge pull request #26301 from zmerlynn/wait_proper
Automatic merge from submit-queue

routecontroller: Add wait.NonSlidingUntil, use it

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]() Make sure the reconciliation loop kicks in again immediately if it
takes a loooooong time.
2016-05-26 03:29:21 -07:00
k8s-merge-robot
bda0dc88aa Merge pull request #25457 from saad-ali/expectedStateOfWorldDataStructure
Automatic merge from submit-queue

Attach Detach Controller Business Logic

This PR adds the meat of the attach/detach controller proposed in #20262.

The PR splits the in-memory cache into a desired and actual state of the world.
2016-05-26 00:41:54 -07:00
k8s-merge-robot
da7d3c189a Merge pull request #25869 from jsafrane/devel/operation-logs
Automatic merge from submit-queue

volume controller: use better operation names

Using volume/claim.UID in the operation name is not really useful, as UIDs are not logged by rest of the controller. On the other hand, volume.Name and claim.Namespace/Name is logged pretty often and it would help to log these also in operation name. Still, I'd prefer to have the operation name really unique to be protected from users deleting a volume and quickly creating another one with the same name, so UID is still part of the operation name.

This has been already proven to be very useful in controller debugging.
2016-05-25 17:58:07 -07:00
Zach Loafman
cb69960742 nodecontroller: Fix log message on successful update 2016-05-25 14:44:15 -07:00
k8s-merge-robot
70a71990d4 Merge pull request #26123 from brendandburns/flaker
Automatic merge from submit-queue

Add some extra checking in the tests to prevent flakes.

Attempts to fix https://github.com/kubernetes/kubernetes/issues/25967

The hypothesis is that somehow waitTest() catches an idle that occurs before all changes have been applied.  This will block until the expected number of changes have arrived.
2016-05-25 14:29:48 -07:00
Zach Loafman
3ec25c5425 routecontroller: Add wait.NonSlidingUntil, use it
Make sure the reconciliation loop kicks in again immediately if it
takes a loooooong time.
2016-05-25 13:58:35 -07:00
k8s-merge-robot
4e8e4a574c Merge pull request #25636 from zhouhaibing089/delnode-fix
Automatic merge from submit-queue

use monotonic now in TestDelNode

Fixes https://github.com/kubernetes/kubernetes/issues/24971.

Briefly, the rate_limited_queue uses a `container/heap` to store values, and use this data structure to ensure we can always fetch the value with the minimum `processAt`. However, in some extreme condition, the continuous call to `time.Now()` would get the same value, which causes some unpredictable order in the queue, this fix uses a monotonic `now()` to avoid that.

@smarterclayton please take a look.
2016-05-25 13:33:31 -07:00
saadali
92500a20d7 Attach detach controller business logic added
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.
2016-05-24 23:01:16 -07:00
Chao Xu
1665546d2d add finalizer logics to the API server and the garbage collector; handling DeleteOptions.OrphanDependents in the API server 2016-05-24 13:07:28 -07:00
Brendan Burns
88663fc58b Add some extra checking in the tests to prevent flakes. 2016-05-23 16:25:02 -07:00
gmarek
08385b2c5f Make service-range flag in controller-manager optional 2016-05-23 09:37:53 +02:00
gmarek
1d89d2f2d2 Add few log lines to NodeController 2016-05-23 08:49:11 +02:00
k8s-merge-robot
b84730ba16 Merge pull request #25748 from derekwaynecarr/hotloop_quota
Automatic merge from submit-queue

ResourceQuota controller uses rate limiter to prevent hot-loops in error situations

Have resource quota controller use a rate limited queue to prevent hot-looping in error situations.
2016-05-22 15:45:03 -07:00
k8s-merge-robot
1b78799b3b Merge pull request #25768 from piosz/metrics-api-hpa
Automatic merge from submit-queue

Use Metrics API in HPA
2016-05-22 13:58:07 -07:00
k8s-merge-robot
62a8394eb4 Merge pull request #25263 from jsafrane/devel/adopt-recycle-pod
Automatic merge from submit-queue

volume recycler: Don't start a new recycler pod if one already exists.

Recycling is a long duration process and when the recycler controller is restarted in the meantime, it should not start a new recycler pod if there is one already running.

This means that the recycler pod must have deterministic name based on name of the recycled PV, we then get name conflicts when creating the pod.

Two things need to be changed:

- recycler controller and recycler plugins must pass the PV.Name to place, where the pod is created. This is most of the patch and it should be pretty straightforward.

- create recycler pod with deterministic name and check "already exists" error.

When at it, remove useless 'resourceVersion' argument and make log messages starting with lowercase.

There is an unit test to check the behavior + there is an e2e test that checks that regular recycling is not broken (it does not try to run two recycler pods in parallel as the recycler is single-threaded now).
2016-05-21 02:28:26 -07:00
Piotr Szczesniak
26ad827893 Use Metrics API in HPA 2016-05-20 19:50:56 +02:00
mqliang
17d5a302bb make podcidr mask size configurable 2016-05-20 20:44:40 +08:00
mqliang
cf7a3475f3 Don't allow node controller to allocate into service CIDR range 2016-05-20 20:44:40 +08:00
mqliang
69b8453fa0 cidr allocator 2016-05-20 20:44:40 +08:00
k8s-merge-robot
3b0a6dac1f Merge pull request #25571 from gmarek/nodecontroller
Automatic merge from submit-queue

NodeController doesn't evict Pods if no Nodes are Ready

Fix #13412 #24597

When NodeControllers don't see any Ready Node it goes into "network segmentation mode". In this mode it cancels all evictions and don't evict any Pods.

It leaves network segmentation mode when it sees at least one Ready Node. When leaving it resets all timers, so each Node has full grace period to reconnect to the cluster.

cc @lavalamp @davidopp @mml @wojtek-t @fgrzadkowski
2016-05-20 05:31:34 -07:00
k8s-merge-robot
bd8033e2b0 Merge pull request #25864 from jsafrane/devel/pv-fix-log
Automatic merge from submit-queue

volume controller: Fix method name in a log message

It's deleteVolume, not deleteClaim.

@kubernetes/sig-storage
2016-05-20 03:53:22 -07:00
Jan Safranek
c7da3abd5b volume controller: Speed up binding by not sorting volumes
The binder sorts all available volumes first, then it filters out volumes
that cannot be bound by processing each volume in a loop and then finds
the smallest matching volume by binary search.

So, if we process every available volume in a loop, we can also remember the
smallest matching one and save us potentially long sorting (and quick binary
search).
2016-05-20 12:26:39 +02:00
Daniel Smith
5448400b1c Merge pull request #25243 from smarterclayton/explore_quantity
Provide an int64 version of Quantity that is much faster
2016-05-19 16:56:48 -07:00
Paul Weil
4d6fee74d0 daemonset handle DeletedFinalStateUnknown 2016-05-19 17:16:34 -04:00
Jan Safranek
0279232360 volume controller: Add cache with the latest version of PVs and PVCs
When the controller binds a PV to PVC, it saves both objects to etcd.
However, there is still an old version of these objects in the controller
Informer cache. So, when a new PVC comes, the PV is still seen as available
and may get bound to the new PVC. This will be blocked by etcd, still, it
creates unnecessary traffic that slows everything down.

Also, we save bound PV/PVC as two transactions - we save PV/PVC.Spec first
and then .Status. The controller gets "PV/PVC.Spec updated" event from etcd
and tries to fix the Status, as it seems to the controller it's outdated.
This write again fails - there already is a correct version in etcd.

We can't influence the Informer cache, it is read-only to the controller.

To prevent these useless writes to etcd, this patch introduces second cache
in the controller, which holds latest and greatest version on PVs and PVCs.
It gets updated with events from etcd *and* after etcd confirms successful
save of PV/PVC modified by the controller.

The cache stores only *pointers* to PVs/PVCs, so in ideal case it shares the
actual object data with the informer cache. They will diverge only when
the controller modifies something and the informer cache did not get update
events yet.
2016-05-19 16:09:06 +02:00
Clayton Coleman
5e4308f91d
Update use of Quantity in other classes 2016-05-19 08:41:43 -04:00
Jan Safranek
e9a6ec29a0 volume controller: use better operation names
Using volume/claim.UID in the operation name is not really useful, as UIDs
are not logged by rest of the controller. On the other hand, volume.Name and
claim.Namespace/Name is logged pretty often and it would help to log these
also in operation name.

This has been already proven to be very useful in controller debugging.
2016-05-19 14:19:33 +02:00
Robert Rati
e388c137bb Separate sync and list functionality in the reflector. #23394 2016-05-19 07:41:24 -04:00
Jan Safranek
0ee9160f88 volume recycler: Don't start a new recycler pod if one already exists.
Recycling is a long duration process and when the recycler controller is
restarted in the meantime, it should not start a new recycler pod if there is
one already running.

This means that the recycler pod must have deterministic name based on name
of the recycled PV, we then get name conflicts when creating the pod.

Two things need to be changed:
- recycler controller and recycler plugins must pass the PV.Name to place,
  where the pod is created.

- create recycler pod with deterministic name and check "already exists" error.

When at it, remove useless 'resourceVersion' argument and make log messages
starting with lowercase.
2016-05-19 12:58:25 +02:00
Jan Safranek
61d630ddf7 volume controller: Fix method name in a log message
It's deleteVolume, not deleteClaim.
2016-05-19 12:54:17 +02:00
k8s-merge-robot
c63ac4e664 Merge pull request #24331 from jsafrane/devel/refactor-binder
Automatic merge from submit-queue

Refactor persistent volume controller

Here is complete persistent controller as designed in https://github.com/pmorie/pv-haxxz/blob/master/controller.go

It's feature complete and compatible with current binder/recycler/provisioner. No new features, it *should* be much more stable and predictable.

Testing
--
The unit test framework is quite complicated, still it was necessary to reach reasonable coverage (78% in `persistentvolume_controller.go`). The untested part are error cases, which are quite hard to test in reasonable way - sure, I can inject a VersionConflictError on any object update and check the error bubbles up to appropriate places, but the real test would be to run `syncClaim`/`syncVolume` again and check it recovers appropriately from the error in the next periodic sync. That's the hard part.

Organization
---
The PR starts with `rm -rf kubernetes/pkg/controller/persistentvolume`. I find it easier to read when I see only the new controller without old pieces scattered around.
[`types.go` from the old controller is reused to speed up matching a bit, the code looks solid and has 95% unit test coverage].

I tried to split the PR into smaller patches, let me know what you think.

~~TODO~~
--

* ~~Missing: provisioning, recycling~~.
* ~~Fix integration tests~~
* ~~Fix e2e tests~~

@kubernetes/sig-storage

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24331)
<!-- Reviewable:end -->

Fixes #15632
2016-05-19 03:06:46 -07:00
k8s-merge-robot
4f09f51486 Merge pull request #24800 from thockin/validation_pt8-3
Automatic merge from submit-queue

Make name validators return string slices

Part of the larger validation PR, broken out for easier review and merge.  Builds on previous PRs in the series.
2016-05-19 02:15:27 -07:00
Daniel Smith
6dc1437015 Merge pull request #25671 from deads2k/fix-add-indexer
make addIndexers safe for sharedInformer
2016-05-18 14:48:43 -07:00
k8s-merge-robot
48c90f15c5 Merge pull request #24509 from caesarxuchao/primitive-gc
Automatic merge from submit-queue

Adding garbage collector controller

Adding the propagator and garbage processor of the gc.

Design doc is at https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/garbage-collection.md

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24509)
<!-- Reviewable:end -->
2016-05-18 03:14:25 -07:00
Jan Safranek
01b20d8e77 Generate shorter provisioned PV names.
GCE PD names are generated out of provisioned PV.Name, therefore it should be
as short as possible and still unique.
2016-05-18 10:06:51 +02:00