Commit Graph

4136 Commits

Author SHA1 Message Date
Janet Kuo
13b76d5fb4 Autogen
make clean && make generated_files
2018-09-04 14:21:14 -07:00
Janet Kuo
cbdc9b671f Make number of workers configurable 2018-09-04 14:21:14 -07:00
Janet Kuo
5186807587 Add TTL GC controller 2018-09-04 13:11:18 -07:00
stewart-yu
3fd3e40803 add OWNERS file 2018-09-04 19:40:13 +08:00
stewart-yu
cef2ab756c [kube-controller-manager] auto-generated file 2018-09-04 19:40:10 +08:00
stewart-yu
1c6c45563f [kube-controller-manager] create package to hold kube-controller-manager component api 2018-09-04 19:39:35 +08:00
stewart-yu
be4a437e71 [kube-controller-manager] just only remove struct and default about KubeControllerManagerConfiguration from pkg/apis/componentconfig 2018-09-04 19:38:48 +08:00
Kubernetes Submit Queue
5b355f5d40 Merge pull request #68122 from krzysztof-jastrzebski/scale_down
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Replace scale down window

**What this PR does / why we need it**:
Replace scale down forbidden window with scale down stabilization window.

This allows scale down based on more than one sample, to avoid rapidly changing size up and down for controllers with fluctuating load.

A bit more in https://docs.google.com/document/d/1IdG3sqgCEaRV3urPLA29IDudCufD89RYCohfBPNeWIM

This PR is copy of #67771 with resolved comments.

**Release note**:
```release-note
Replace scale down forbidden window with scale down stabilization window. Rather than waiting a fixed period of time between scale downs HPA now scales down to the highest recommendation it during the scale down stabilization window.
```
2018-09-03 21:39:02 -07:00
Kubernetes Submit Queue
06ffb07e8e Merge pull request #68135 from shyamjvs/add-random-backoff-to-cidr-allocator
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Use random backoff for retries in cloud-cidr-allocator

Ref https://github.com/kubernetes/kubernetes/pull/68084#issuecomment-417651247
/cc @wojtek-t 

```release-note
NONE
```
2018-09-03 18:41:40 -07:00
Kubernetes Submit Queue
54978d7080 Merge pull request #67959 from gnufied/approver-attach-detach
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Add gnufied as approver for attach/detach controller

Hopefully has reviewed and made enough fixes in this
area to understand the code thoroughly.

```release-note
None
```

/assign @saad-ali @jsafrane
2018-09-02 12:51:16 -07:00
saad-ali
247bad23f0 Improve CSI CRD installation code 2018-09-02 09:23:36 -07:00
Deep Debroy
7946c6e21b Implement semantic comparison of VolumeNodeAffinity for unit tests
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-08-31 19:05:05 -07:00
Klaus Ma
85a19b109a Taint node in paralle.
Signed-off-by: Klaus Ma <klaus1982.cn@gmail.com>
2018-09-01 09:57:02 +08:00
Kubernetes Submit Queue
85300f4f5d Merge pull request #67803 from saad-ali/csiClusterReg3
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

CSI Cluster Registry and Node Info CRDs

**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in https://github.com/kubernetes/community/pull/2514 and https://github.com/kubernetes/community/pull/2034

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/features/issues/594

**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).

**Release note**:

```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```

CC @jsafrane
2018-08-31 16:46:41 -07:00
saad-ali
fdeb895d25 Automatically install CRDs during controller init 2018-08-31 12:25:59 -07:00
Jan Safranek
7d673cb8f0 Pass new CSI API Client and informer to Volume Plugins 2018-08-31 12:25:59 -07:00
Krzysztof Jastrzebski
958cba1c82 Replace scale down forbidden window
Replacement is scale down stabilization window. HPA will scale down only
    to max of recommendations it made during that window. More details in

    https://docs.google.com/document/d/1IdG3sqgCEaRV3urPLA29IDudCufD89RYCohfBPNeWIM
2018-08-31 20:24:38 +02:00
Kubernetes Submit Queue
2548fb08cd Merge pull request #68068 from krzysztof-jastrzebski/hpas2
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Change CPU sample sanitization in HPA.

**What this PR does / why we need it**:
Change CPU sample sanitization in HPA.
    Ignore samples if:
    - Pod is beeing initalized - 5 minutes from start defined by flag
        - pod is unready
        - pod is ready but full window of metric hasn't been colected since
        transition
    - Pod is initialized - 5 minutes from start defined by flag:
        - Pod has never been ready after initial readiness period.

**Release notes:**
```release-note
Improve CPU sample sanitization in HPA by taking metric's freshness into account.
```
2018-08-31 10:17:44 -07:00
Shyam Jeedigunta
8123a9ac7b Use random backoff for retries in cloud-cidr-allocator 2018-08-31 16:07:20 +02:00
wojtekt
fcd2882722 Fix retrying in ipam controller 2018-08-31 13:41:16 +02:00
Di Xu
8afdda1030 cleanup: remove unused options for rs controller 2018-08-31 19:00:59 +08:00
houjun
0ed234dbff Add unit test case for controller/disruption 2018-08-31 14:12:18 +08:00
Krzysztof Jastrzebski
5357bf9eac Change CPU sample sanitization in HPA.
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
    - pod is unready
    - pod is ready but full window of metric hasn't been colected since
    transition
- Pod is initialized - 5 minutes from start defined by flag:
    - Pod has never been ready after initial readiness period.
2018-08-30 23:13:14 +02:00
Bowei Du
d3facac6ef Make CIDR allocation retry backoff exponentially
This also sets to the retry time to be less aggressive

fixes #67348
2018-08-30 12:03:05 -07:00
Kubernetes Submit Queue
e3969fed1d Merge pull request #67825 from nikopen/master
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Fix VMWare VM freezing bug by reverting #51066

**What this PR does / why we need it**: kube-controller-manager, VSphere specific: When the controller tries to attach a Volume to Node A that is already attached to Node B, Node A freezes until the volume is attached.  Kubernetes continues to try to attach the volume as it thinks that it's 'multi-attachable' when it's not. #51066 is the culprit.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/vmware/kubernetes/issues/500 / https://github.com/vmware/kubernetes/issues/502 (same issue)

**Special notes for your reviewer**:

- Repro:

Vsphere installation, any k8s version from 1.8 and above, pod with attached PV/PVC/VMDK:

1. cordon the node which the pod is in
2. `kubectl delete po/[pod] --force --grace-period=0`
3. the pod is immediately rescheduled to a new node. Grab the new node from a `kubectl describe [pod]` and attempt to Ping it or SSH into it.
4. you can see that pings/ssh fail to reach the new node. `kubectl get node` shows it as 'NotReady'. New node is frozen until the volume is attached - usually 1 minute freeze for 1 volume in a low-load cluster, and many minutes more with higher loads and more volumes involved.

- Patch verification:

Tested a custom patched 1.9.10 kube-controller-manager with #51066 reverted and the above bug is resolved - can't repro it anymore. New node doesn't freeze at all, and attaching happens quite quickly, in a few seconds.


**Release note**:

``` 
Fix VSphere VM Freezing bug by reverting #51066 

```
2018-08-29 15:19:41 -07:00
lichuqiang
4c43d626f2 related test update 2018-08-29 10:30:16 +08:00
lichuqiang
b4a57f6855 combine feature gate VolumeScheduling and DynamicProvisioningScheduling into one 2018-08-29 10:30:08 +08:00
Kubernetes Submit Queue
42c6f1fb28 Merge pull request #67067 from moonek/master
Automatic merge from submit-queue (batch tested with PRs 67067, 67947). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Do not count soft-deleted pods for scaling purposes in HPA controller

**What this PR does / why we need it**:
The metrics of "soft-deleted" pods in general to be deleted should probably not matter for scaling purposes, since they'll be gone "soon", whether they're nodelost or just normally delete.

As long as soft-deleted pods still exist, they prevent normal scale up.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/62845

**Special notes for your reviewer**:

**Release note**:

```release-note
Stop counting soft-deleted pods for scaling purposes in HPA controller to avoid soft-deleted pods incorrectly affecting scale up replica count calculation.
```
2018-08-28 15:08:01 -07:00
Kubernetes Submit Queue
b49e0b7f3a Merge pull request #67883 from krzysztof-jastrzebski/hpas
Automatic merge from submit-queue (batch tested with PRs 67938, 66719, 67883). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove incorrect glog error from Horizontal Pod Autoscaler Controller.

**What this PR does / why we need it**:
Pro removes  incorrect glog error from Horizontal Pod Autoscaler Controller.

**Release note:**
```release-note
none
```
2018-08-28 10:02:08 -07:00
moonek
3fedbe48e3 Do not count soft-deleted pods for scaling purposes in HPA controller 2018-08-28 16:27:47 +00:00
Hemant Kumar
f665843934 Add gnufied as approver for attach/detach controller
Hopefully has reviewed and made enough fixes in this
area to understand the code thoroughly.
2018-08-28 12:03:20 -04:00
Kubernetes Submit Queue
2eb14e3007 Merge pull request #64973 from nokia/k8s-sctp
Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

SCTP support implementation for Kubernetes

**What this PR does / why we need it**: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #44485

**Special notes for your reviewer**:

**Release note**:

```release-note

SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.  

```
2018-08-28 07:21:18 -07:00
tanshanshan
a83c4dbd19 fix spelling mistakes 2018-08-28 17:12:36 +08:00
Krzysztof Jastrzebski
dfd88dbde0 Remove incorrect glog error from Horizontal Pod Autoscaler. 2018-08-28 09:18:25 +02:00
Klaus Ma
5713d96f36 Volunteer to be DaemonSet controller maintainer.
Signed-off-by: Klaus Ma <klaus1982.cn@gmail.com>
2018-08-28 14:20:12 +08:00
Kubernetes Submit Queue
0148f25fe7 Merge pull request #67734 from Huang-Wei/fix-nodelost-issue
Automatic merge from submit-queue (batch tested with PRs 64597, 67854, 67734, 67917, 67688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix an issue that scheduling doesn't respect NodeLost status of a node

**What this PR does / why we need it**:

- if Node is in UnknowStatus, apply unreachable taint with NoSchedule effect
- some internal data structure refactoring
- update unit test

**Which issue(s) this PR fixes**:
Fixes #67733, and very likely #67536

**Special notes for your reviewer**:

See detailed reproducing steps in #67733.

**Release note**:
```release-note
Apply unreachable taint to a node when it lost network connection.
```
2018-08-27 22:18:12 -07:00
Kubernetes Submit Queue
d744c6ea61 Merge pull request #66085 from liggitt/updatejob
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix updateJob scheduling of resync

fixes #66071 

```release-note
NONE
```
2018-08-27 17:40:54 -07:00
Wei Huang
8f93a93281 fix a comment inconsistency in Daemonset Controller 2018-08-27 16:28:25 -07:00
Wei Huang
7c024273a4 fix an issue that scheduling doesn't respect NodeLost status of a node
- if Node is in UnknowStatus, apply unreachable taint with NoSchedule effect
- some internal data structure refactoring
- update unit test
2018-08-27 11:46:15 -07:00
Mike Dame
dd7e81a8cd Add dry run test for hpa v2beta2 2018-08-27 11:37:22 -04:00
Mike Dame
77d7f9cfa2 Generate files and modifications for autoscaling/v2beta2 and custom_metrics/v1beta2 2018-08-27 11:07:53 -04:00
Mike Dame
c7102ee5dc Implement autoscaling/v2beta2 features in HPA controller 2018-08-27 11:07:52 -04:00
yue9944882
d11ee913a1 prune flipping int/ext conversion for quota controller 2018-08-27 21:49:26 +08:00
Laszlo Janosi
a6da2b1472 K8s SCTP support implementation for the first pull request
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.

SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.

SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter

changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.

netexec.go compilation fixed. Various test cases fixed

SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)

SCTP related e2e test cases are removed as the e2e test systems do not support SCTP

sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go

cluster/gce/util.sh is copied from master
2018-08-27 05:56:27 +00:00
Kubernetes Submit Queue
b02261a140 Merge pull request #67826 from deads2k/controller-03-missingisgone
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

controller expectations for deletion can be met by 404

A controller asks pod control to delete a pod because it wants the pod to be gone.  It doesn't really care if the imperative delete action itself succeeds.  When the pod is already gone (404), then the desire of the controller is met.

Since the pods themselves are cache driven, you can hit this condition more than you may like. See https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L582 as an example.

@kubernetes/sig-apps-bugs 
/assign @janetkuo @tnozicka 


```release-note
latent controller caches no longer cause repeating deletion messages for deleted pods
```
2018-08-26 11:56:23 -07:00
Kubernetes Submit Queue
663551bebd Merge pull request #67252 from jbartosik/metric-sanitization
Automatic merge from submit-queue (batch tested with PRs 66916, 67252, 67794, 67619, 67328). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix HPA sample sanitization

**What this PR does / why we need it**: @mwielgus pointed out a case when HPA fails as a result of my changes to HPA algorithm:
- Have pods that use a lot of CPU during initilization, become ready right after they initialize,
- Trigger a scale up,
- When new pods become ready will will count their usage (even though it's not related to any work that needs doing),
- This triggers another scale up, even though existing pods can handle work, no problem.

The fix is:
- Use all samples for non-cpu metrics.
- Only use CPU samples if:
  - Pod is ready and was started more than 2 minutes ago, or
  - Pod is unready and last readiness change happened more than 10s after it was started.

Reasoning behind this in: https://docs.google.com/document/d/1UdtYedhmCxjaJIQi6hwJMY0eHQQKxlVD8lSHZC1BPOA/edit

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:

**Release note**:
```release-note
Replace scale up forbidden window with disregarding CPU samples collected when pod was initializing.
```
2018-08-24 15:25:07 -07:00
David Eads
a2ee93b531 controller expectations for deletion can be met by 404 2018-08-24 09:03:51 -04:00
nikopen
6f2a45aefe Fix VMWare VM freezing bug by reverting #51066 2018-08-24 14:28:44 +02:00
Joachim Bartosik
4fd6a1684d Make HPA more configurable
Duration of initialization taint on CPU and window of initial readiness
setting controlled by flags.

Adding API violation exceptions following example of e50340ee23
2018-08-24 13:13:02 +02:00
Kubernetes Submit Queue
c4f355a2ad Merge pull request #66971 from tnozicka/informer-watcher
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

#50102 Task 2: Add UntilWithSync

**What this PR does / why we need it**:
This is a split off from https://github.com/kubernetes/kubernetes/pull/50102 to go in smaller pieces.

Introduces UntilWithSync based on informer.

**Needs https://github.com/kubernetes/kubernetes/pull/66906 first**
/hold

**Release note**:
```release-note
NONE
```

/priority important-soon
/kind bug
(bug after the main PR which is this split from)
2018-08-23 07:26:25 -07:00