Commit Graph

2606 Commits

Author SHA1 Message Date
Yassine TIJANI
cc5977aaa0 avoiding unnecessary loop to copy pods listed see #46433
adding comments stating that returned pods should be used as read-only objects

fixing typo

avoiding unnecessary loop to copy pods listed see #46433

fixing fmt

avoiding unnecessary loop to copy pods listed see #46433
2017-05-29 15:40:30 +02:00
Kubernetes Submit Queue
c34b359bd7 Merge pull request #45923 from verult/cxing/NodeStatusUpdaterFix
Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294)

Node status updater now deletes the node entry in attach updates...

… when node is missing in NodeInformer cache.

- Added RemoveNodeFromAttachUpdates as part of node status updater operations.



**What this PR does / why we need it**: Fixes issue of unnecessary node status updates when node is deleted.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #42438

**Special notes for your reviewer**: Unit tested added, but a more comprehensive test involving the attach detach controller requires certain testing functionality that is currently absent, and will require larger effort. Will be added at a later time.

There is an edge case caused by the following steps:
1) A node is deleted and restarted. The node exists, but is not yet recognized by Kubernetes.
2) A pod requiring a volume attach with nodeName specifically set to this node.

This would make the pod stuck in ContainerCreating state. This is low-pri since it's a specific edge case that can be avoided.

**Release note**:

```release-note
NONE
```
2017-05-26 12:58:02 -07:00
Kubernetes Submit Queue
b7ebdfa978 Merge pull request #46383 from mikedanese/fix-flake
Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294)

fix certificates flake

Fixes https://github.com/kubernetes/kubernetes/issues/46365
Fixes https://github.com/kubernetes/kubernetes/issues/46374
2017-05-26 12:57:58 -07:00
Mike Danese
bbe1e9caa4 fix certificates flake 2017-05-26 11:03:45 -07:00
Kubernetes Submit Queue
bcad534ebc Merge pull request #46058 from jcbsmpsn/configure-certificate-duration
Automatic merge from submit-queue

Add support for specifying certificate duration at runtime.
2017-05-26 11:02:03 -07:00
Kubernetes Submit Queue
749ac27e9a Merge pull request #45003 from krmayankk/garbage
Automatic merge from submit-queue (batch tested with PRs 45518, 46127, 46146, 45932, 45003)

PodDisruptionBudget should use ControllerRef

Fixes https://github.com/kubernetes/kubernetes/issues/42284

```release-note
PodDisruptionBudget now uses ControllerRef to decide which controller owns a given Pod, so it doesn't get confused by controllers with overlapping selectors.
```
2017-05-25 11:46:08 -07:00
Kubernetes Submit Queue
079020f559 Merge pull request #46160 from NickrenREN/fix-UX
Automatic merge from submit-queue

fix regression in UX experience for double attach volume

send event when volume is not allowed to multi-attach

Fixes #46012

**Release note**:
```release-note
NONE
```
2017-05-25 08:50:12 -07:00
Kubernetes Submit Queue
26d7ee0447 Merge pull request #44774 from kargakis/uniquifier
Automatic merge from submit-queue

Switch Deployments to new hashing algo w/ collision avoidance mechanism

Implements https://github.com/kubernetes/community/pull/477

@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews 

Fixes https://github.com/kubernetes/kubernetes/issues/29735
Fixes https://github.com/kubernetes/kubernetes/issues/43948

```release-note
Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore.
```
2017-05-25 06:09:58 -07:00
Michail Kargakis
9190a47c37
Generated changes for collision count
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 12:23:17 +02:00
Kubernetes Submit Queue
8f9f412d2f Merge pull request #46162 from lixiaobing10051267/masterFound
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)

break the loop when found true

break the loop when found true.
2017-05-25 03:14:03 -07:00
Michail Kargakis
4a2c5eae92
Implement hash collision avoidance mechanism
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 11:17:45 +02:00
Michail Kargakis
aeb2d9b9b4
Deep equality helper should not mutate state
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 11:17:45 +02:00
Michail Kargakis
fcf68ba7a7
Remove obsolete deployment helpers
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 11:17:44 +02:00
Cheng Xing
f9dc2d5ca3 Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. Fixes #42438.
- Added RemoveNodeFromAttachUpdates as part of node status updater operations.
2017-05-24 18:31:47 -07:00
NickrenREN
add091b1fb fix regression in UX experience for double attach volume
send event when volume is not allowed to multi-attach
2017-05-25 09:27:24 +08:00
Jacob Simpson
07e9b0e197 Add support for specifying certificate duration at runtime. 2017-05-24 13:29:46 -07:00
deads2k
ba5a1113e6 don't queue namespaces for deletion if the namespace isn't deleted 2017-05-24 14:47:53 -04:00
Kubernetes Submit Queue
84401e7601 Merge pull request #45891 from zjj2wry/zjj-t
Automatic merge from submit-queue (batch tested with PRs 45891, 46147)

fix typo

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-05-24 00:42:56 -07:00
Kubernetes Submit Queue
5be7a6a73e Merge pull request #45514 from mikedanese/cert-refactor
Automatic merge from submit-queue (batch tested with PRs 45514, 45635)

refactor certificate controller to break it into two parts

Break pkg/controller/certificates into:
* pkg/controller/certificates/approver: containing the group approver
* pkg/controller/certificates/signer: containing the local signer
* pkg/controller/certificates: containing shared infrastructure
```release-note
Break the 'certificatesigningrequests' controller into a 'csrapprover' controller and 'csrsigner' controller.
```
2017-05-23 20:52:53 -07:00
Mayank Kumar
3ab6082958 PodDisruptionBudget should use ControllerRef 2017-05-23 19:43:38 -07:00
Kubernetes Submit Queue
447ee4a1c9 Merge pull request #46258 from MrHohn/esipp-fix-needsUpdate
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)

Detect ExternalTrafficPolicy and HealthCheckNodePort changes in needsUpdate()

Fix a bug that editing ExternalTrafficPolicy doesn't trigger LoadBalancer update. I'm surprise that ESIPP e2e tests didn't catch this.

/assign @freehan @thockin 

**Release note**:

```release-note
NONE
```
2017-05-23 19:43:04 -07:00
Kubernetes Submit Queue
45b275d52c Merge pull request #45897 from ncdc/gc-require-list-watch
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)

GC: update required verbs for deletable resources, allow list of ignored resources to be customized

The garbage collector controller currently needs to list, watch, get,
patch, update, and delete resources. Update the criteria for
deletable resources to reflect this.

Also allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.

cc @caesarxuchao @deads2k @smarterclayton @mfojtik @liggitt @sttts @kubernetes/sig-api-machinery-pr-reviews
2017-05-23 15:48:57 -07:00
Mike Danese
f04ce3cfba refactor certificate controller 2017-05-23 15:25:58 -07:00
Andy Goldstein
d1a0384678 GC: allow ignored resources to be customized
Allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.
2017-05-23 12:05:09 -04:00
Anirudh
48d76edc74 PDB MaxUnavailable: Generated 2017-05-23 07:42:24 -07:00
Anirudh
ce48d4fb5c PDB MaxUnavailable: Disruption Controller Changes 2017-05-23 07:18:44 -07:00
Kubernetes Submit Queue
cc6e51c6e8 Merge pull request #45427 from ncdc/gc-shared-informers
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)

Use shared informers in gc controller if possible

Modify the garbage collector controller to try to use shared informers for resources, if possible, to reduce the number of unique reflectors listing and watching the same thing.

cc @kubernetes/sig-api-machinery-pr-reviews @caesarxuchao @deads2k @liggitt @sttts @smarterclayton @timothysc @soltysh @kargakis @kubernetes/rh-cluster-infra @derekwaynecarr @wojtek-t @gmarek
2017-05-22 20:58:03 -07:00
Kubernetes Submit Queue
c2c5051adf Merge pull request #44899 from smarterclayton/burst
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)

Support parallel scaling on StatefulSets

Fixes #41255

```release-note
StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`.  The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed.  Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set.
```
2017-05-22 19:07:09 -07:00
Zihong Zheng
5f814d957e Detect ExternalTrafficPolicy and HealthCheckNodePort changes in needsUpdate() 2017-05-22 18:15:48 -07:00
Eric Tune
b17e3c14eb Move PDB controller and type ownership to SIG-Apps
Created OWNERS_ALIASES called sig-apps-reviewers from the union of reviewers in:
 pkg/controller/{cronjob,deployment,daemon,job,replicaset,statefulset}/OWNERS
except removed inactive user bprashanth

Created OWNERS_ALIASES called sig-apps-api-reviewers as the intersection
of sig-apps-reviewers and the approvers from pkg/api/OWNERS.

Used those OWNERS_ALIASES as the reviewers/approvers for the disruption controller,
and API.
2017-05-22 12:55:28 -07:00
Andy Goldstein
2480f2ceb6 Use shared informers in gc controller if possible 2017-05-22 12:51:37 -04:00
Kubernetes Submit Queue
16b5093feb Merge pull request #46037 from ncdc/ns-controller-aggregate-errors
Automatic merge from submit-queue (batch tested with PRs 46164, 45471, 46037)

NS controller: don't stop deleting GVRs on error

**What this PR does / why we need it**:

If the namespace controller encounters an error trying to delete a
single GroupVersionResource, add the error to an aggregated list of
errors and continue attempting to delete all the GroupVersionResources
instead of stopping at the first error. Return the aggregated error list
(if any) when done. This allows us to delete as much of the content in
the namespace as we can in each pass.

**Special notes for your reviewer**:

This may help with some of the namespace deletions taking too long in our e2e tests.

**Release note**:

```release-note
```
2017-05-22 09:08:56 -07:00
Kubernetes Submit Queue
574608d2e9 Merge pull request #46169 from kargakis/progress-when-ready
Automatic merge from submit-queue (batch tested with PRs 45864, 46169)

Account newly ready replicas as progress

@kubernetes/sig-apps-pr-reviews
2017-05-22 08:08:56 -07:00
Clayton Coleman
20d45af694
Combine statefulset burst and monotonic scaling tests
Use subtests to avoid duplicating entire suite of control logic.
2017-05-21 01:14:30 -04:00
Clayton Coleman
2861ae5eb9
Support burst in stateful set scale up and down
The alpha field podManagementPolicy defines how pods are created,
deleted, and replaced. The new `Parallel` policy will replace pods
as fast as possible, not waiting for the pod to be `Ready` or providing
an order. This allows for advanced clustered software to take advantage
of rapid changes in scale.
2017-05-21 01:14:26 -04:00
Clayton Coleman
ad720cc651
generated: bazel 2017-05-20 21:58:38 -04:00
Michail Kargakis
7910dc3131
Account newly ready replicas as progress
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-20 21:14:50 +02:00
Clayton Coleman
784e3ae5fa
Switch the tokens controller to use shared informers
Tokens controller previously needed a bit of extra help in order to be
safe for concurrent use. The new MutationCache allows it to keep a local
cache and still use a shared informer. The filtering event handler lets
it only see changes to secrets it cares about.
2017-05-20 14:19:49 -04:00
Clayton Coleman
3e095d12b4
Refactor move of client-go/util/clock to apimachinery 2017-05-20 14:19:48 -04:00
lixiaobing1
6949b5dbe7 break the loop when found true
Signed-off-by: lixiaobing1 <li.xiaobing1@zte.com.cn>
2017-05-20 15:22:07 +08:00
Kubernetes Submit Queue
f499606bfe Merge pull request #45346 from codablock/fix_double_attach
Automatic merge from submit-queue

Don't try to attach volumes which are already attached to other nodes

This PR is a replacement for https://github.com/kubernetes/kubernetes/pull/40148. I was not able to push fixes and rebases to the original branch as I don't have access to the Github organization anymore.

CC @saad-ali You probably have to update the PR link in [Q2 2017 (v1.7)](https://docs.google.com/spreadsheets/d/1t4z5DYKjX2ZDlkTpCnp18icRAQqOE85C1T1r2gqJVck/edit#gid=14624465)

I assume the PR will need a new "ok to test" 

**ORIGINAL PR DESCRIPTION**

This PR fixes an issue with the attach/detach volume controller. There are cases where the `desiredStateOfWorld` contains the same volume for multiple nodes, resulting in the attach/detach controller attaching this volume to multiple nodes. This of course fails for volumes like AWS EBS, Azure Disks, ...

I observed this situation on Azure when using Azure Disks and replication controllers which start to reschedule PODs. When you delete a POD that belongs to a RC, the RC will immediately schedule a new POD on another node. This results in a short time (max a few seconds) where you have 2 PODs which try to attach/mount the same volume on different nodes. As the old POD is still alive, the attach/detach controller does not try to detach the volume and starts to attach the volume to the new POD immediately.

This behavior was probably not noticed before on other clouds as the bogus attempt to attach probably fails pretty fast and thus is unnoticed. As the situation with the 2 PODs disappears after a few seconds, a detach for the old POD is initiated and thus the new POD can attach successfully.

On Azure however, attaching and detaching takes quite long, resulting in the first bogus attach attempt to already eat up much time.
When attaching fails on Azure and reports that it is already attached somewhere else, the cloud provider immediately does a detach call for the same volume+node it tried to attach to. This is done to make sure the failed attach request is aborted immediately. You can find this here: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_storage.go#L74

The complete flow of attach->fail->abort eats up valuable time and the attach/detach controller can not proceed with other work while this is happening. This means, if the old POD disappears in the meantime, the controller can't even start the detach for the volume which delays the whole process of rescheduling and reattaching.

Also, I and other people have observed very strange behavior where disks ended up being "attached" to multiple VMs at the same time as reported by Azure Portal. This results in the controller to fail reattaching forever. It's hard to figure out why and when this happens and there is no reproducer known yet. I can imagine however that the described behavior correlates with what I described above.

I was not sure if there are actually cases where it is perfectly fine to have a volume mounted to multiple PODs/nodes. At least technically, this should be possible with network based volumes, e.g. nfs. Can someone with more knowledge about volumes help me here? I may need to add a check before skipping attaching in `reconcile`.

CC @colemickens @rootfs

-->
```release-note
Don't try to attach volume to new node if it is already attached to another node and the volume does not support multi-attach.
```
2017-05-19 21:54:42 -07:00
Kubernetes Submit Queue
2473c24f81 Merge pull request #45979 from bowei/owners
Automatic merge from submit-queue

Add bowei to OWNERS: e2e/test dns,network; cloud route, node, service…
2017-05-19 19:39:05 -07:00
Bowei Du
3af1c0efcb Add bowei to OWNERS: e2e/test dns,network; cloud route, node, service controller 2017-05-19 14:49:43 -07:00
Wojciech Tyczynski
d2529bb6b6 Avoid sleep in endpoint controller 2017-05-19 13:57:36 +02:00
Andy Goldstein
e8e87cb1c2 NS controller: don't stop deleting GVRs on error
If the namespace controller encounters an error trying to delete a
single GroupVersionResource, add the error to an aggregated list of
errors and continue attempting to delete all the GroupVersionResources
instead of stopping at the first error. Return the aggregated error list
(if any) when done. This allows us to delete as much of the content in
the namespace as we can in each pass.
2017-05-18 12:01:40 -04:00
Clayton Coleman
bdd4d34c7d
generated: api changes 2017-05-18 10:07:47 -04:00
Alexander Block
06baeb33b2 Don't try to attach volumes which are already attached to other nodes 2017-05-18 06:56:30 +02:00
Kubernetes Submit Queue
7df0178076 Merge pull request #42975 from smarterclayton/time_namespace
Automatic merge from submit-queue (batch tested with PRs 40234, 45885, 42975)

Log how much time it takes e2e tests to clean up the namespace
2017-05-17 20:27:52 -07:00
Kubernetes Submit Queue
6dbe853e29 Merge pull request #45544 from ianchakeres/reconciler-err-cleanup
Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678)

Refactor reconciler volume log and error messages

**What this PR does / why we need it**:
Utilizes volume-specific error and log messages introduced in #44969, inside files that also log volume information. 

Specifically: 

- pkg/kubelet/volumemanager/reconciler/reconciler.go, 
- pkg/controller/volume/attachdetach/reconciler/reconciler.go, and
- pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go


**Which issue this PR fixes** : fixes #40905

**Special notes for your reviewer**:

**Release note**:

```release-note
```
NONE
2017-05-17 18:40:51 -07:00
Clayton Coleman
7da310ea28
Fix namespace controller logging to be consistent
time.Now() was wrong, simplify namespace controller output
2017-05-17 17:45:05 -04:00