Commit Graph

1147 Commits

Author SHA1 Message Date
k8s-merge-robot
ec518005a8 Merge pull request #27567 from saad-ali/blockKubeletOnAttachController
Automatic merge from submit-queue

Kubelet Volume Manager Wait For Attach Detach Controller and Backoff on Error

* Closes https://github.com/kubernetes/kubernetes/issues/27483
  * Modified Attach/Detach controller to report `Node.Status.AttachedVolumes` on successful attach (unique volume name along with device path).
  * Modified Kubelet Volume Manager wait for Attach/Detach controller to report success before proceeding with attach.
* Closes https://github.com/kubernetes/kubernetes/issues/27492
  * Implemented an exponential backoff mechanism for for volume manager and attach/detach controller to prevent operations (attach/detach/mount/unmount/wait for controller attach/etc) from executing back to back unchecked.
* Closes https://github.com/kubernetes/kubernetes/issues/26679
  * Modified volume `Attacher.WaitForAttach()` methods to uses the device path reported by the Attach/Detach controller in `Node.Status.AttachedVolumes` instead of calling out to cloud providers.
2016-06-20 20:36:08 -07:00
saadali
e716ddc771 Controller wait for attach and exponential backoff
Modify attach/detach controller to keep track of volumes to report
attached in Node VolumeToAttach status.

Modify kubelet volume manager to wait for volume to show up in Node
VolumeToAttach status.

Implement exponential backoff for errors in volume manager and attach
detach controller
2016-06-20 18:19:55 -07:00
k8s-merge-robot
d19c8ed825 Merge pull request #27609 from ZTE-PaaS/zhangke-patch-001
Automatic merge from submit-queue

EndpointController syncService log error

Here key param should service nor rc
2016-06-20 13:06:44 -07:00
k8s-merge-robot
d8b463dfd2 Merge pull request #27128 from markturansky/disable_provisioning
Automatic merge from submit-queue

Allow disabling of dynamic provisioning

Allow administrators to opt-out of dynamic provisioning.  Provisioning is still on by default, which is the current behavior.

Per a conversation with @jsafrane, a boolean toggle was added and plumbed through into the controller.  Deliberate disabling will simply return nil from `provisionClaim` whereas a misconfigured provisioner will continue on and generate error events for the PVC.

@kubernetes/rh-storage @saad-ali @thockin  @abhgupta
2016-06-20 02:10:43 -07:00
k8s-merge-robot
0730ffbff7 Merge pull request #27434 from jsafrane/pv-events-message
Automatic merge from submit-queue

Fill PV.Status.Message with deleter/recycler errors.

Instead of empty `Message` `kubectl describe pv` now shows:

```
Name:		nfs
Labels:		<none>
Status:		Failed
Claim:		default/nfs
Reclaim Policy:	Recycle
Access Modes:	RWX
Capacity:	1Mi
Message:	Recycler failed: Pod was active on the node longer than specified deadline
Source:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	10.999.999.999
    Path:	/
    ReadOnly:	false
```

This is actually a regression since 1.2

@kubernetes/sig-storage
2016-06-20 01:36:28 -07:00
saadali
926bb4cca0 Add patch status to Node internalclientset 2016-06-19 23:54:02 -07:00
markturansky
16ec36c591 added toggle to disable dynamic provisioning 2016-06-20 01:15:23 -04:00
goltermann
218645b346 Fix several spelling errors in comments. 2016-06-17 10:41:18 -07:00
Ke Zhang
c8471f2c3e EndpointController syncService log error 2016-06-17 17:05:50 +08:00
k8s-merge-robot
646a872f15 Merge pull request #27415 from caesarxuchao/fix-oldrc
Automatic merge from submit-queue

fix updatePod() of RS and RC controllers

Fix updatePod of replication controller manager and replica set controller to handle pod label updates that match no RC or RS.

Fix #27405
2016-06-16 17:09:53 -07:00
Chao Xu
63fb075f0a fix updatePod of replication controller manager and replica set controller to
handle pod label updates that match no rc or rs
2016-06-15 10:34:26 -07:00
saadali
542f2dc708 Introduce new kubelet volume manager
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).

This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
2016-06-15 09:34:08 -07:00
saadali
9b6a505f8a Rename UniqueDeviceName to UniqueVolumeName
Rename UniqueDeviceName to UniqueVolumeName and move helper functions
from attacherdetacher to volumehelper package.
Introduce UniquePodName alias
2016-06-15 09:32:12 -07:00
Jan Safranek
449e9f49d3 Fill PV.Status.Message with deleter/recycler errors. 2016-06-15 14:56:31 +02:00
k8s-merge-robot
2b9670b77b Merge pull request #27190 from caesarxuchao/remove-debugging-log
Automatic merge from submit-queue

Fix a debugging line

A trivial update. @k8s-oncall can we manually merge it?
2016-06-14 16:53:09 -07:00
Wojciech Tyczynski
5d702a32c1 Fix race in informer 2016-06-14 16:40:12 +02:00
k8s-merge-robot
f97bca37a5 Merge pull request #27127 from jsafrane/refactor-binder-operations
Automatic merge from submit-queue

Rework PV controller to use util/goroutinemap


@kubernetes/sig-storage
2016-06-12 23:44:28 -07:00
k8s-merge-robot
628af356b8 Merge pull request #26980 from hongchaodeng/fix
Automatic merge from submit-queue

processor listener: fix locking in pop()

Currently the lock in processorListener is used to guard pendingNotifications. But in pop, it also locks around on select chan. This will block the goroutine with lock acquired.

This PR changes the lock to guard the correct section only.
2016-06-12 17:59:09 -07:00
Chao Xu
c15c10f312 fix a log line 2016-06-10 09:58:27 -07:00
Janet Kuo
764df2e096 Listing pods only once when getting pods for RS in deployment 2016-06-10 09:55:28 -07:00
Jan Safranek
6081bd61f0 Rework PV controller to use util/goroutinemap 2016-06-09 13:49:04 +02:00
Hongchao Deng
d4eb48c0bb add TestPopReleaseLock 2016-06-08 11:34:35 -07:00
Hongchao Deng
308201acb0 processor listener: fix locking in pop() 2016-06-08 11:34:35 -07:00
k8s-merge-robot
707cc2bbb8 Merge pull request #26493 from caesarxuchao/fix-gc-flake
Automatic merge from submit-queue

Fixes 25890 flake. Let GC convert ListOptions to v1 before passing it to the dynamic client

GC's ListWatcher directly passed the api.ListOptions to the dynamic client, but the parameter codec of dynamic client converts the options to queries based on the tags in the struct, which are not present in api.ListOptions, so the queries are not sent to the server. As a result, the Watch request was sent without a resourceVersion, causing missed events. Flake #25890 is caused by the missed deletion events.

This PR converts the api.ListOptions to v1.ListOptions before the GC passes it to the dynamic codec. The flaky test has successfully passed 79 times ([log](https://00e9e64bacd064560a027fbee9c5a373a1614f3a56e652ae40-apidata.googleusercontent.com/download/storage/v1_internal/b/kubernetes-jenkins/o/pr-logs%2Fpull%2F25923%2Fkubernetes-pull-test-unit-integration%2F28364%2Fbuild-log.txt?qk=AD5uMEv72OjSUqDyk5i-ZLurcmM4i7gket1c7WaqR7yuIYz7WhPYT7ewVBafijV0ymnPTYqxRYt1kp6S9YQv7chPwC-3UtrKetKfhYnvAFrPGXAIBxHytTmpFohRAYgsARN1B6j1f9vyK5lM-8jyzRGhCK3sCRsAPnbDBWIWFlbH4b1n3vUET3P71QamHrF5itYyaqRU5pMZV3Cwwr81X8q7h5hCzm3Ip78RpMzfjEqTG0RcM2TLGccUrlkWVBLh4hn0NFpUIkzVFugFA5ooJffo-0AdJnO3mGWEOnXNVFWftJbK8cKnTns0DISrYFOyH_PlOe_YHCxgIXIT-dW8G-nbqoUjn5SBqunr36rcpaYCIwe2va4W_AcLCT43xiEAezRER_U9AuIqi_22KMd6SuHTyljhmWFPvPk8-gpjthLWXhcE7LPO5dV41hnZHnbI4n_9eI1nSVm7q9XdSvX1sWKV1GCwn8oj017AnxVvl9bScultko_0dTC747UqJ6UTFakLuFcHFe-F5Tz7ItDWlBVPoXeC7gTpyuicFKLsdqGlW9F5X6kIwNrBRj9uRsS-QuzSER-fVkQCn4dUTcokttRH_0bYvyfr9oqiDXmywMgOp-L0sKayk8JOVynh2q0Tju9sdkvFr0PxoAjhofomfIC1SZ_JkOzwAT1TUW8dLjPHluMct34xW_-qna1AmkoxM4bZQLhllap96NTC-0IdtzeKDrTul8p7u3WXSJjjEMSijibTNMlnkB0AluT1_RNO94OnzuFv4YlcV24FPhJzchhbyKREkOb_wzgcnSbRwGHjIcfRgkX-IzoXHVBcMYFUrPmsXrnRcfad4XwjkUOgvivkURW2_EwnzgrLDh-IKek51_0FpT1MnFCSG0gQbVSs_iMVPr6UXNAw62LGbKVtl3ZMXyapEpcO8azNbn6Wvd550R704JXxYlU)).

@lavalamp @krousey @smarterclayton
2016-06-04 01:52:31 -07:00
k8s-merge-robot
bd2bc25308 Merge pull request #25865 from jsafrane/devel/pv-convert-from-12
Automatic merge from submit-queue

volume controller: Convert PersistentVolumes from Kubernetes 1.2

In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a claim for dynamic volume was detected, Kubernetes did:

- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first and then creates a Kubernetes PV instance already with the correct pointer to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these unprovisioned template PVs. The new controller does not use them, it will see PVC for dynamic provisioning and create real PV instead.

See https://github.com/pmorie/pv-haxxz/pull/3 for pseudocode.
2016-06-03 23:27:13 -07:00
k8s-merge-robot
4877153727 Merge pull request #26772 from jsafrane/flake-controller-cache-empty
Automatic merge from submit-queue

Wait for all volumes/claims to get synced in unit test.

Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.

Fixes #26712
2016-06-03 17:05:22 -07:00
k8s-merge-robot
a00dbea133 Merge pull request #26758 from mqliang/lookupcache-threadsafe
Automatic merge from submit-queue

bugfix:lookupcache's Get method can not be called concurrently

ref https://github.com/kubernetes/kubernetes/issues/26376

@lavalamp @therc @mikedanese
2016-06-03 12:46:13 -07:00
Chao Xu
06f49f7ca7 Let the dynamic client take a customized parameter codec for List, Watch, and DeleteCollection.
Let the gc's ListWatcher use api.ParameterCodec. Fixes 25890.
2016-06-03 11:22:51 -07:00
mqliang
9a0ff5a9e8 bugfix:lookupcache's Get method can not be called concurrently 2016-06-04 02:21:25 +08:00
Jan Safranek
27b11c5342 Convert PersistentVolumes from Kubernetes 1.2
In Kubernetes 1.2 we used template PersistentVolume for provisioning. When a
claim for dynamic volume was detected, Kubernetes did:
- create template PV for the claim with dummy pointer to storage asset
- allocate storage asset such as AWS EBS
- fill real pointer to the created storage asset to the template PV

In refactored volume provisioner, Kubernetes allocates the storage asset first
and then creates a Kubernetes PV instance already with the correct pointer
to the storage asset.

To support seamles upgrade from 1.2 to 1.3 we need to remove these
unprovisioned template PVs. The new controller does not use them, it will see
PVC for dynamic provisioning and create real PV instead.
2016-06-03 14:26:06 +02:00
k8s-merge-robot
3157e87cb2 Merge pull request #26768 from wojtek-t/routecontroller_logs
Automatic merge from submit-queue

Improve logging in routecontroller

@zmerlynn
2016-06-03 04:51:12 -07:00
k8s-merge-robot
59e008dbcb Merge pull request #26733 from pmorie/pv-controller-typos
Automatic merge from submit-queue

Fix typo and linewrap comments in PV controller

Fix some typos and linewrap long comments that I found while going over this code investigating something.
2016-06-03 04:51:08 -07:00
Wojciech Tyczynski
de1d35a66d Improve logging in routecontroller 2016-06-03 12:05:12 +02:00
Jan Safranek
962505ad01 Wait for all volumes/claims to get synced in unit test.
Controller.HasSynced() returns true when all initial claims/volumes were sent
to appropriate goroutines, not when the goroutine has actually processed them.
2016-06-03 10:53:56 +02:00
k8s-merge-robot
75ef1ca270 Merge pull request #26351 from saad-ali/attachDetachControllerKubeletChanges
Automatic merge from submit-queue

Attach/Detach Controller Kubelet Changes

This PR contains changes to enable attach/detach controller proposed in #20262.

Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
  * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9#issuecomment-221770924 opened to fix that.
  * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.

Fixes #14642, #19953

```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.

A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
2016-06-02 23:30:32 -07:00
k8s-merge-robot
a41d84408c Merge pull request #26518 from jsafrane/initial-sync
Automatic merge from submit-queue

Fill controller caches on startup

The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)
2016-06-02 21:44:56 -07:00
Saad Ali
9dbe943491 Attach/Detach Controller Kubelet Changes
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
  enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
  annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
  get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
  opened to fix that.
2016-06-02 16:47:11 -07:00
Paul Morie
277c0a4e90 Fix typo and linewrap comments in PV controller 2016-06-02 15:50:07 -04:00
Janet Kuo
36f704c975 List RSes only once when getting old+new RSes in deployment controller 2016-06-02 11:24:43 -07:00
k8s-merge-robot
335da9b125 Merge pull request #26410 from jsafrane/fix-test-race
Automatic merge from submit-queue

Fix data race in volume controller unit test.

Reactor must be locked when fiddling with reactor.volumes and reactor.claims. Therefore add new functions to add/delete volume/claim with sending an event.

Fixes #26345
2016-06-02 04:25:08 -07:00
k8s-merge-robot
745eb08e83 Merge pull request #26595 from janetkuo/log-test-e2e-deployment
Automatic merge from submit-queue

Adding logs in deployment for debugging



Ref #26509
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
2016-06-01 20:35:42 -07:00
Jan Safranek
ee74cc4354 Fix fake event recorder race
Event recorder should wait for some time to get all expected events, the event
may be written by another goroutine that just have finished.

It should not slow down the test in most cases, only when there is a bug and
expected event is not sent.
2016-06-01 10:16:35 +02:00
Jan Safranek
2d43e4549e Fix data race in volume controller unit test.
Reactor must be locked when fiddling with reactor.volumes and reactor.claims.
Therefore add new functions to add/delete volume/claim with sending an event.
2016-06-01 08:35:33 +02:00
k8s-merge-robot
04f77dd602 Merge pull request #26556 from jsafrane/fix-format
Automatic merge from submit-queue

Fix log arguments.

'i' is not printed.
@kubernetes/sig-storage
2016-05-31 21:24:50 -07:00
k8s-merge-robot
38d5be4f36 Merge pull request #26555 from jsafrane/stabilize-test-flakes
Automatic merge from submit-queue

Stabilize controller unit tests.

Remove test "5-1", it's flaky as it depends on order of execution of goroutines. When the controller starts, existing claim is enqueued as "initial sync event" and a new volume is enqueued to separate goroutine. It is not deterministic which goroutine processes its events first and there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all events are processed.

Fixes unit test flakes.
@kubernetes/sig-storage
2016-05-31 17:06:12 -07:00
Janet Kuo
310a7d2eb5 Adding logs in deployment for debugging 2016-05-31 15:59:46 -07:00
k8s-merge-robot
38181bb3fb Merge pull request #25917 from pmorie/pv-selector
Automatic merge from submit-queue

Add LabelSelector to PersistentVolumeClaimSpec

Implements #25413.

@kubernetes/sig-storage @bgrant0607 @thockin @jsafrane @eparis
2016-05-31 08:22:07 -07:00
Jan Safranek
21059e8b6d Fix log arguments.
'i' is not printed.
2016-05-31 12:12:15 +02:00
Jan Safranek
011eac7c8b Stabilize controller unit tests.
Remove test "5-1", it's flaky as it depends on order of execution of
goroutines. When the controller starts, existing claim is enqueued as
"initial sync event" and a new volume is enqueued to separate goroutine.
It is not deterministic which goroutine processes its events first and
there is no way how to tell that the claim event was processed.

Also, force resync of the controllers after the test to make sure all
events are processed.
2016-05-31 12:07:47 +02:00
gmarek
7cac170214 AllocateOrOccupyCIDR returs quickly 2016-05-31 09:11:42 +02:00