Commit Graph

1939 Commits

Author SHA1 Message Date
Michail Kargakis
ce04ee6170 extensions: add readyReplicas in Deployments 2017-01-02 11:59:15 +01:00
Mike Danese
161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
Kubernetes Submit Queue
69ddd8eb27 Merge pull request #39247 from wojtek-t/optimize_controller_manager_memory
Automatic merge from submit-queue

Avoid unnecessary memory allocations

Low-hanging fruits in saving memory allocations. During our 5000-node kubemark runs I've see this:

ControllerManager:
- 40.17% k8s.io/kubernetes/pkg/util/system.IsMasterNode
- 19.04% k8s.io/kubernetes/pkg/controller.(*PodControllerRefManager).Classify

Scheduler:
- 42.74% k8s.io/kubernetes/plugin/pkg/scheduler/algrorithm/predicates.(*MaxPDVolumeCountChecker).filterVolumes

This PR is eliminating all of those.
2016-12-28 00:02:59 -08:00
rkouj
e7e3c55ad7 Add unit tests for MountVolume() of operation executor 2016-12-27 16:07:06 -08:00
rkouj
d5f7610b82 Refactor operation_executor to make it unit testable 2016-12-27 15:12:16 -08:00
Wojciech Tyczynski
d1292a7397 Optimize memory allocations in controller manager 2016-12-27 16:11:11 +01:00
Kubernetes Submit Queue
48793a48d4 Merge pull request #34273 from wlan0/master
Automatic merge from submit-queue (batch tested with PRs 39093, 34273)

start breaking up controller manager into two pieces

This PR addresses: https://github.com/kubernetes/features/issues/88

This commit starts breaking the controller manager into two pieces, namely,
1. cloudprovider dependent piece
2. coudprovider agnostic piece

the controller manager has the following control loops -
- nodeController
- volumeController
- routeController
- serviceController
- replicationController
- endpointController
- resourceQuotaController
- namespaceController
- deploymentController 
  etc..

among the above controller loops,
- nodeController
- volumeController
- routeController
- serviceController

are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult
for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the
cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first
step in moving towards a kubernetes code base where cloud providers specific code will move out of
the core repository, and will be maintained by the cloud providers themselves.

I have added a new cloud provider called "external", which signals the controller-manager that
cloud provider specific loops are being run by another controller. I have added these changes in such
a way that the existing cloud providers are not affected. This change is completely backwards compatible, and does not require any changes to the way kubernetes is run today.

Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will be addressed in a different commit/issue.

@alena1108 @ibuildthecloud @thockin @dchen1107 

**Special notes for your reviewer**:

@thockin - Im making this **WIP** PR to ensure that I don't stray too far from everyone's view of how we should make this change. As you can see, only one controller, namely `nodecontroller` can be disabled with the `--cloudprovider=external` flag at the moment. I'm working on cleaning up the `rancher-controller-manger` that I wrote to test this.

Secondly, I'd like to use this PR to address cloudprovider specific code in kubelet and api-server.

**Kubelet**
Kubelet uses provider specific code for node registration and for checking node-status. I thought of two ways to divide the kubelet: 
- We could start a cloud provider specific kubelet on each host as a part of kubernetes, and this cloud-specific-kubelet does node registration and node-status checks. 
- Create a kubelet plugin for each provider, which will be started by kubelet as a long running service. This plugin can be packaged as a binary.

I'm leaning towards the first option. That way, kubelet does not have to manage another process, and we can offload the process management of the cloud-provider-specific-kubelet to something like systemd. 

@dchen1107 @thockin what do you think?

**Kube-apiserver**

Kube-apiserver uses provider specific code for distributing ssh keys to all the nodes of a cluster. Do you have any suggestions about how to address this? 

**Release note**:

``` release-note
```
2016-12-23 01:25:28 -08:00
Mayank Kumar
777977612b ReplicaSet has owner ref of the Deployment that created it 2016-12-22 16:45:50 -08:00
wlan0
75da310757 sanitize names and add more comments, and other essential boilerplate changes 2016-12-22 14:37:15 -08:00
wlan0
1e48fd18cb add cloud-controller-manager as the first step in breaking controller-manager 2016-12-22 14:37:15 -08:00
wlan0
731616e0b2 start breaking up controller manager into two pieces
Addresses: kubernetes/features#88

This commit starts breaking the controller manager into two pieces, namely,

1. cloudprovider dependent piece
2. coudprovider agnostic piece

the controller manager has the following control loops -

   - nodeController
   - volumeController
   - routeController
   - serviceController
   - replicationController
   - endpointController
   - resourcequotacontroller
   - namespacecontroller
   - deploymentController etc..

among the above controller loops,

   - nodeController
   - volumeController
   - routeController
   - serviceController

are cloud provider dependent. As kubernetes has evolved tremendously, it has become difficult
for different cloudproviders (currently 8), to make changes and iterate quickly. Moreover, the
cloudproviders are constrained by the kubernetes build/release lifecycle. This commit is the first
step in moving towards a kubernetes code base where cloud providers specific code will move out of
the core repository, and will be maintained by the cloud providers themselves.

Finally, along with the controller-manager, the kubelet also has cloud-provider specific code, and that will
be addressed in a different commit/issue.
2016-12-22 14:37:14 -08:00
Kubernetes Submit Queue
744876d13f Merge pull request #38798 from NickrenREN/nodecontroller-status
Automatic merge from submit-queue

delete continue in monitorNodeStatus
2016-12-21 10:35:25 -08:00
Kubernetes Submit Queue
ad47a181ee Merge pull request #38986 from ncdc/fix-daemonset-controller-cache-mutation
Automatic merge from submit-queue

Fix DaemonSet cache mutation

**What this PR does / why we need it**: stops the DaemonSetController from mutating the DaemonSet shared informer cache

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #38985

cc @deads2k @mikedanese @lavalamp @smarterclayton
2016-12-21 09:09:18 -08:00
Kubernetes Submit Queue
f42574893b Merge pull request #39011 from wojtek-t/node_controller_listing_from_cache
Automatic merge from submit-queue

NodeController listing nodes from cache instead of cache in apiserver

This is reducing load on apiserver.
2016-12-21 03:13:09 -08:00
Kubernetes Submit Queue
237be4b2be Merge pull request #38855 from gnufied/fix-variable-shadow-exp-backoff
Automatic merge from submit-queue (batch tested with PRs 36888, 38180, 38855, 38590)

Fix variable shadowing in exponential backoff when deleting volumes

While https://github.com/kubernetes/kubernetes/pull/38339 implemented exponential backoff on
volume deletion, that PR suffers from a minor bug when error thrown on volume deletion is anything other than `VolumeInUse` errors - in which case exponential backoff will not work.

This PR fixes that. This PR also makes unit tests more deterministic because exponential backoff changed the way operations are permitted.

CC @jsafrane @childsb @wongma7
2016-12-20 20:33:56 -08:00
Hemant Kumar
7b423085fa Fix variable shadowing in exponential backoff when deleting volumes
Also fix pv_controller unit tests to behave more accurately
in light of exponential backoffs
2016-12-20 21:31:12 -05:00
Wojciech Tyczynski
1b2d9eb2e7 NodeController listing nodes from cache instead of cache in apiserver 2016-12-20 13:13:14 +01:00
Kubernetes Submit Queue
d373d1c467 Merge pull request #38917 from foxyriver/if-statement-must-be-true
Automatic merge from submit-queue (batch tested with PRs 38426, 38917, 38891, 38935)

if statement must be true

**What this PR does / why we need it**:

if len(metrics.Items)==0, the function would been returned. so the statement if len(metrics.Items) > 0 is redudant, it must be true.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2016-12-19 18:18:24 -08:00
Andy Goldstein
febc641cee Fix DaemonSet controller cache mutation
Add dsStoreSynced so we also wait on this cache when starting the
DaemonSetController.

Switch to using a fake clientset in the unit tests.

Fix TestNumberReadyStatus so it doesn't expect the cache to be mutated.
2016-12-19 16:39:23 -05:00
Kubernetes Submit Queue
40bed8e189 Merge pull request #38080 from kargakis/requeue-on-selector-updates
Automatic merge from submit-queue

controller: sync deployments once they don't overlap anymore

Fixes https://github.com/kubernetes/kubernetes/issues/34458.

@kubernetes/deployment
2016-12-19 07:31:15 -08:00
Kubernetes Submit Queue
5f82fe76a2 Merge pull request #38878 from kubernetes/revert-38780-ds-fix1
Automatic merge from submit-queue (batch tested with PRs 34353, 33837, 38878)

Revert "daemonset: bail out after we enqueue once"

I get overzealous sometimes.

Reverts kubernetes/kubernetes#38780
2016-12-19 06:43:00 -08:00
Michail Kargakis
04c6fecbc7 controller: use defaultResync for the deployment controller 2016-12-19 14:04:15 +01:00
Michail Kargakis
d19a1109e2 controller: sync deployments once they don't overlap anymore 2016-12-19 14:04:15 +01:00
foxyriver
69c76d8398 if statement must be true 2016-12-17 11:52:41 +08:00
Maciej Szulik
9f064c57ce Remove extensions/v1beta1 Job 2016-12-17 00:07:24 +01:00
Mike Danese
3a6593c9f1 Revert "daemonset: bail out after we enqueue once" 2016-12-16 10:18:06 -08:00
Robert Rati
91931c138e [scheduling] Moved node affinity from annotations to api fields. #35518 2016-12-16 11:42:43 -05:00
Kubernetes Submit Queue
5b240ca897 Merge pull request #36748 from kargakis/remove-events-from-deployment-tests
Automatic merge from submit-queue

Fix Recreate for Deployments and stop using events in e2e tests

Fixes https://github.com/kubernetes/kubernetes/issues/36453 by removing events from the deployment tests. The test about events during a Rolling deployment is redundant so I just removed it (we already have another test specifically for Rolling deployments).

Closes https://github.com/kubernetes/kubernetes/issues/32567 (preferred to use pod LISTs instead of a new status API field for replica sets that would add many more writes to replica sets).

@kubernetes/deployment
2016-12-16 03:57:02 -08:00
Kubernetes Submit Queue
7ca5f92b58 Merge pull request #38780 from mikedanese/ds-fix1
Automatic merge from submit-queue

daemonset: bail out after we enqueue once

This isn't terrible because we dedup in the queue but it's a waste of
cycles.
2016-12-15 16:15:52 -08:00
Michail Kargakis
7ef3e6f7c9 controller: wait for all pods to be deleted before Recreating 2016-12-15 19:55:18 +01:00
bprashanth
98c7fe98e1 Don't eat 403 in service controller 2016-12-15 10:27:14 -08:00
NickrenREN
fab228a4ef delete continue in monitorNodeStatus
the continue will run at the end of the for loop, we do not need it
2016-12-15 13:41:24 +08:00
Kubernetes Submit Queue
d8efc779ed Merge pull request #38154 from caesarxuchao/rename-release_1_5
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)

Rename "release_1_5" clientset to just "clientset"

We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.

@kubernetes/sig-api-machinery @deads2k 

```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
2016-12-14 14:21:51 -08:00
Mike Danese
3a311a2bc2 daemonset: bail out after we enqueue once
This isn't terrible because we dedup in the queue but it's a waste of
cycles.
2016-12-14 12:59:06 -08:00
Chao Xu
6709b7ada2 run hack/update-codegen.sh
run hack/verify-gofmt.sh
update bazel
2016-12-14 12:39:49 -08:00
Chao Xu
03d8820edc rename /release_1_5 to /clientset 2016-12-14 12:39:48 -08:00
Kubernetes Submit Queue
af23f40f82 Merge pull request #37272 from brendandburns/cleanup
Automatic merge from submit-queue

Remove 'minion' from the code in two places in favor of 'node'

Part of https://github.com/kubernetes/kubernetes/issues/1111
2016-12-14 00:09:43 -08:00
Kubernetes Submit Queue
7b8ecda289 Merge pull request #38743 from caesarxuchao/remove
Automatic merge from submit-queue

Remove accidentally committed files

Accidentally committed in #37534.
2016-12-13 20:44:16 -08:00
Chao Xu
411128f294 remove wrongly committed files 2016-12-13 19:44:51 -08:00
Dan Winship
f369372dad Drop version-parsing from pkg/version
pkg/version is now just version constants, etc, not version parsing
2016-12-13 08:53:19 -05:00
Kubernetes Submit Queue
15f9572b8c Merge pull request #38613 from kargakis/do-not-adopt-when-deleted
Automatic merge from submit-queue

controller: adopt pods only when controller is not deleted

When a replica set is deleted it will continue adopting pods thus driving the worker that handles it in erroring out because the adoption is [always cancelled](59c313730c/pkg/controller/controller_ref_manager.go (L110)) in the controller reference manager.
```
E1212 14:40:31.245773    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-73c3m_791e16cb-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.258462    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-73c3m_791e16cb-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.259131    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-73c3m_791e16cb-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.259149    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-wrmt8_791e3d46-c070-11e6-a234-68f72840e7df because the controlller is being deleted
I1212 14:40:31.268012    7964 deployment_controller.go:314] Error syncing deployment e2e-tests-deployment-2rr3m/test-rollover-deployment: Operation cannot be fulfilled on deployments.extensions "test-rollover-deployment": the object has been modified; please apply your changes to the latest version and try again
E1212 14:40:31.277252    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-73c3m_791e16cb-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.277276    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-wrmt8_791e3d46-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.277287    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-bmqpn_81482114-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.289148    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-b6s4x_82fa8343-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.289169    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-73c3m_791e16cb-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.289176    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-wrmt8_791e3d46-c070-11e6-a234-68f72840e7df because the controlller is being deleted
E1212 14:40:31.289181    7964 replica_set.go:616] cancel the adopt attempt for pod e2e-tests-deployment-2rr3m_test-rollover-deployment-1981456318-bmqpn_81482114-c070-11e6-a234-68f72840e7df because the controlller is being deleted
```

@kubernetes/deployment @caesarxuchao
2016-12-13 04:57:49 -08:00
Kubernetes Submit Queue
8abbedae54 Merge pull request #38315 from mikedanese/pin-gazel
Automatic merge from submit-queue

Pin gazel to a version and support cgo

This fixes the bazel build.

@krousey who is buildcop
2016-12-12 19:32:29 -08:00
Kubernetes Submit Queue
f45e918b8b Merge pull request #35833 from apelisse/owners-pkg-controller
Automatic merge from submit-queue

Curating Owners: pkg/controller

cc @jsafrane @mikedanese @bprashanth @derekwaynecarr @thockin @saad-ali

In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone **lgtms** and then someone
experienced in the project **approves**), we are adding new reviewers to
existing owners files.
## If You Care About the Process:

We did this by algorithmically figuring out who’s contributed code to
the project and in what directories.  Unfortunately, that doesn’t work
perfectly: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.

Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).

At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
## TLDR:

As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the OWNERS file to add
   the names of people that should be reviewing code in the future in the **reviewers** section. You probably do NOT need to modify the **approvers** section.
3. Notify me if you want some OWNERS file to be removed.  Being an approver or reviewer
   of a parent directory makes you a reviewer/approver of the subdirectories too, so not all
   OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
   over again (don't hesitate to ask me for help, or use the pull-request
   above as an example)
2016-12-12 18:51:33 -08:00
Prashanth B
8ff3182fd4 Update OWNERS 2016-12-12 17:55:18 -08:00
Prashanth B
0eda833c31 Update OWNERS 2016-12-12 17:54:39 -08:00
Mike Danese
c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Kubernetes Submit Queue
5e6578a734 Merge pull request #38419 from freehan/service-status-update
Automatic merge from submit-queue

bump log level on service status update

ref: https://github.com/kubernetes/kubernetes/issues/38349

I tried to reproduce the problem in #38349 and failed. Not sure why service status update failed and service controller skip status update in the next round. What I have observed is that if service status update failed due to conflict, the next round of processServiceUpdate will correct it. 

Bumping log level to get a better signal when it occurs.
2016-12-12 12:42:53 -08:00
Michail Kargakis
ec2c79a35e controller: adopt pods only when controller is not deleted 2016-12-12 15:12:44 +01:00
Michail Kargakis
9c7b39066e Log enqueueing replica sets for availability checks 2016-12-12 14:09:16 +01:00
Kubernetes Submit Queue
83a77fa5a1 Merge pull request #38299 from kargakis/calculate-unavailable-correctly
Automatic merge from submit-queue (batch tested with PRs 38608, 38299)

controller: set unavailableReplicas correctly when scaling down

```
deployment_controller.go:299] Error syncing deployment
e2e-tests-kubectl-2l7xx/e2e-test-nginx-deployment:
Deployment.extensions "e2e-test-nginx-deployment" is invalid:
status.unavailableReplicas: Invalid value: -1:
must be greater than or equal to 0
```

The validation error above occurs usually when a Deployment is
scaled down. In such a case we should default unavailableReplicas
to 0 instead of making an invalid api call.

@kubernetes/deployment
2016-12-12 04:18:04 -08:00