Commit Graph

4953 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
26d7ee0447 Merge pull request #44774 from kargakis/uniquifier
Automatic merge from submit-queue

Switch Deployments to new hashing algo w/ collision avoidance mechanism

Implements https://github.com/kubernetes/community/pull/477

@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews 

Fixes https://github.com/kubernetes/kubernetes/issues/29735
Fixes https://github.com/kubernetes/kubernetes/issues/43948

```release-note
Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore.
```
2017-05-25 06:09:58 -07:00
Michail Kargakis
9190a47c37 Generated changes for collision count
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 12:23:17 +02:00
Kubernetes Submit Queue
9c1480bb61 Merge pull request #46366 from nicksardo/gce-subnetwork-url
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)

GCE - Retrieve subnetwork name/url from gce.conf 

**What this PR does / why we need it**:
Features like ILB require specifying the subnetwork if the network is type manual.

**Notes:**
The network URL can be [constructed](68e7e18698/pkg/cloudprovider/providers/gce/gce.go (L211-L217)) by fetching instance metadata; however, the subnetwork is not provided through this feature. Users must specify the subnetwork name/url through the gce.conf.

Although multiple subnets can exist in the same region for a network, the cloud provider will only use one subnet url for creating LBs. 


**Release note**:
```release-note
NONE
```
2017-05-25 03:14:05 -07:00
Kubernetes Submit Queue
23348ceedc Merge pull request #46354 from smarterclayton/metrics_subresource
Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366)

Subresources are not included in apiserver prometheus metrics

Subresources are very often completely different code paths and errors
generated on those code paths are important to distinguish.

@kubernetes/sig-api-machinery-pr-reviews

```release-note
The Prometheus metrics for the kube-apiserver for tracking incoming API requests and latencies now return the `subresource` label for correctly attributing the type of API call.
```
2017-05-25 03:13:59 -07:00
Michail Kargakis
4a2c5eae92 Implement hash collision avoidance mechanism
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-25 11:17:45 +02:00
Kubernetes Submit Queue
d84f3f4b7e Merge pull request #46363 from MrHohn/fix-CheckPodsCondition
Automatic merge from submit-queue (batch tested with PRs 45913, 46065, 46352, 46363, 46373)

Fix CheckPodsCondition to print out the correct podName

From a couple CIs (https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-serial/1114, https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-gci-qa-serial-master/2246, https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke-pre-release/2187), all indicate we print out the wrong pod name in CheckPodsCondition for _"Pod XXX failed to be running and ready, or succeeded."_:
```
I0524 02:09:50.173] May 24 02:09:50.173: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m55.033881993s elapsed)
I0524 02:09:52.178] May 24 02:09:52.178: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m57.03848264s elapsed)
I0524 02:09:54.183] May 24 02:09:54.182: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m59.043463323s elapsed)
I0524 02:09:56.183] May 24 02:09:56.183: INFO: Pod fluentd-gcp-v2.0-6wf67 failed to be running and ready, or succeeded.
I0524 02:09:56.184] May 24 02:09:56.183: INFO: Wanted all 23 pods to be running and ready, or succeeded. Result: false. Pods: [heapster-v1.3.0-3806988011-kzkg6 kube-proxy-bootstrap-e2e-minion-group-bbwn rescheduler-v0.3.0-bootstrap-e2e-master monitoring-influxdb-grafana-v4-1q59k l7-default-backend-1044750973-zgxsc etcd-server-events-bootstrap-e2e-master kube-apiserver-bootstrap-e2e-master kube-proxy-bootstrap-e2e-minion-group-6nqb kube-proxy-bootstrap-e2e-minion-group-mzbz fluentd-gcp-v2.0-chd2x kube-dns-806549836-f8p46 fluentd-gcp-v2.0-44x97 kube-dns-autoscaler-2528518105-vlg8t fluentd-gcp-v2.0-p1h4b kube-controller-manager-bootstrap-e2e-master l7-lb-controller-v0.9.3-bootstrap-e2e-master kubernetes-dashboard-2917854236-tn3nx kube-dns-806549836-fq2fp kube-scheduler-bootstrap-e2e-master etcd-empty-dir-cleanup-bootstrap-e2e-master kube-addon-manager-bootstrap-e2e-master etcd-server-bootstrap-e2e-master fluentd-gcp-v2.0-6wf67]
I0524 02:09:56.184] May 24 02:09:56.183: INFO: At least one pod wasn't running and ready or succeeded at test start.
I0524 02:09:56.184] [AfterEach] [k8s.io] Restart [Disruptive]
```

Check the codes and found we always print out the last pod name, which is random. Pass the pod name into channel to fix.

**Release note**:

```release-note
NONE
```
2017-05-25 00:11:05 -07:00
Clayton Coleman
ad431c454c Subresources are not included in apiserver prometheus metrics
Subresources are very often completely different code paths and errors
generated on those code paths are important to distinguish.
2017-05-24 16:23:50 -04:00
Nick Sardo
e7ee3913d7 Add subnetworkUrl param to e2e 2017-05-24 10:54:51 -07:00
Zihong Zheng
03d08623e8 Fix CheckPodsCondition to print out the correct podName 2017-05-24 10:20:57 -07:00
Kubernetes Submit Queue
dae6955555 Merge pull request #46293 from nicksardo/chaosmonkey-defer-stop
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)

Chaosmonkey - Signal stop to tests and wait for done when disruption fails

**What this PR does / why we need it**:
Prevents tests from leaking resources because their Teardown was never called when test disruption fails.   

**Which issue this PR fixes**
First problem of #45842 

**Release note**:
```release-note
NONE
```
2017-05-23 15:48:59 -07:00
Kubernetes Submit Queue
1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install
Automatic merge from submit-queue

Enable "kick the tires" support for Nvidia GPUs in COS

This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
*  `/home/kubernetes/bin/nvidia/bin` for debug utilities

Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.

Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.

This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS. 
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.

**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.

**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up` 
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.

TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
2017-05-23 10:46:10 -07:00
Nick Sardo
f40f45abc1 Defer test stop & cleanup 2017-05-23 10:11:46 -07:00
Anirudh
63e51dc66e PDB MaxUnavailable: e2e tests 2017-05-23 07:18:44 -07:00
Kubernetes Submit Queue
c2c5051adf Merge pull request #44899 from smarterclayton/burst
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)

Support parallel scaling on StatefulSets

Fixes #41255

```release-note
StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`.  The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed.  Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set.
```
2017-05-22 19:07:09 -07:00
Kubernetes Submit Queue
0329e3fdaf Merge pull request #46211 from gmarek/panic
Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910)

Add more logs to kubelet_stats

Ref. #46198
2017-05-22 15:50:00 -07:00
Mik Vyatskov
f605040165 Make Stackdriver Logging e2e tests less restrictive 2017-05-22 18:14:20 +02:00
gmarek
38981e9fd4 Add more logs to kubelet_stats 2017-05-22 15:49:57 +02:00
Clayton Coleman
e40648de68 E2E test for statefulset burst 2017-05-21 01:14:31 -04:00
Vishnu kannan
1e77594958 Adding an installer script that installs Nvidia drivers in Container Optimized OS
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Kubernetes Submit Queue
112ed869c7 Merge pull request #46053 from dashpole/test_eviction_metrics
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)

Log age of stats used for evictions during eviction tests

I recently added prometheus metrics for the age of the metrics used for evictions #43031.  It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions.

This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement.  Feel free to take a look whenever either of you has time.

/assign @mtaufen 
/assign @Random-Liu
2017-05-19 23:29:28 -07:00
Kubernetes Submit Queue
73e7ef1f8c Merge pull request #46011 from MrHohn/e2e-fix-return-podnames
Automatic merge from submit-queue (batch tested with PRs 45996, 46121, 45707, 46011, 45564)

Fix waitForNPods in restart.go

From https://github.com/kubernetes/kubernetes/issues/45991#issuecomment-302292404.

Don't redefine `pods` so we can return real pod names instead of empty array.

/assign @dchen1107 @bowei 

**Release note**:

```release-note
NONE
```
2017-05-19 18:57:36 -07:00
David Ashpole
0bd0d705e3 log age of stats used for evictions during eviction tests 2017-05-18 13:51:23 -07:00
Zihong Zheng
2095e0bee6 Fix waitForNPods in restart.go 2017-05-17 20:47:11 -07:00
Kubernetes Submit Queue
7df0178076 Merge pull request #42975 from smarterclayton/time_namespace
Automatic merge from submit-queue (batch tested with PRs 40234, 45885, 42975)

Log how much time it takes e2e tests to clean up the namespace
2017-05-17 20:27:52 -07:00
Kubernetes Submit Queue
747b706c4b Merge pull request #45678 from a-robinson/1.0
Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678)

Add explicit image tag to cockroachdb example and test

@gyliu513 

```release-note
NONE
```
2017-05-17 18:40:59 -07:00
Clayton Coleman
4f35b31fc7 Try speeding up ConfigMap e2e namespace deletion 2017-05-17 17:45:05 -04:00
Clayton Coleman
fcab3a442d Log how much time it takes e2e tests to clean up the namespace
Will get a better handle on deletion test wasted time
2017-05-17 17:45:04 -04:00
Janet Kuo
3f2d8ae682 Extract common code in deployment e2e and integration test 2017-05-17 14:41:59 -07:00
Janet Kuo
282c90bc1a Remove e2e test for creating a new deployment 2017-05-17 14:41:59 -07:00
Kubernetes Submit Queue
581ebf46a9 Merge pull request #45852 from wongma7/subpath-e2e
Automatic merge from submit-queue

Add subPath e2e: test permissions and when subPath pre-exists

These tests cover issues https://github.com/kubernetes/kubernetes/issues/45613 and ~~https://github.com/kubernetes/kubernetes/pull/43775~~ https://github.com/kubernetes/kubernetes/issues/41638
-->
```release-note
NONE
```
2017-05-17 12:21:36 -07:00
Matthew Wong
036b64d54e Add subPath e2e: test permissions and when subPath pre-exists 2017-05-17 09:14:29 -04:00
Zihong Zheng
5992425588 Autogenerated files 2017-05-16 21:55:51 -07:00
Zihong Zheng
c0920f75cf Move API annotations into annotation_key_constants and remove api/annotations package 2017-05-16 21:55:23 -07:00
Kubernetes Submit Queue
1e6061b9ec Merge pull request #45763 from piosz/es-owners
Automatic merge from submit-queue

Added coffeepac to ElasticSearch owners

@coffeepac

@fgrzadkowski, could you please add @coffeepac to Kubernetes org?
2017-05-16 12:22:59 -07:00
Wojciech Tyczynski
7809e583e8 Parallelize creation/deletion of services in load test 2017-05-16 13:00:16 +02:00
Kubernetes Submit Queue
3386425475 Merge pull request #45831 from MrHohn/esipp-panic-fix
Automatic merge from submit-queue

Check endpoint subsets length before asserting addresses.

Fix #45824.

Panics were caused by [WaitForEndpointOnNode()](3227f44157/test/e2e/framework/service_util.go (L329)). Check subsets length ahead to prevent panicing.

/assign @freehan

cc @wojtek-t 

**Release note**:

```release-note
NONE
```
2017-05-16 00:32:58 -07:00
Zihong Zheng
6797f2a7a9 Check endpoint subsets length before asserting addresses. 2017-05-15 11:12:18 -07:00
Dmitry Shulyak
2612e0c78a Move client/unversioned/remotecommand to client-go
Module remotecommand originally part of kubernetes/pkg/client/unversioned was moved
to client-go/tools, and will be used as authoritative in kubectl, e2e and other places.

Module remotecommand relies on util/exec module which will be copied to client-go/pkg/util
2017-05-15 16:28:56 +03:00
Piotr Szczesniak
da8f82cbd0 Added coffeepac to ElasticSearch owners 2017-05-13 07:48:09 +02:00
Kubernetes Submit Queue
3619c33350 Merge pull request #42759 from mtaufen/kubelet-apis-reorg
Automatic merge from submit-queue

Reorganize kubelet tree so apis can be independently versioned

@yujuhong @lavalamp @thockin @bgrant0607 
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.

Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
2017-05-12 17:43:22 -07:00
Kubernetes Submit Queue
35eba22cc7 Merge pull request #41162 from MrHohn/esipp-ga
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)

Promotes Source IP preservation for Virtual IPs from Beta to GA

Fixes #33625. Feature issue: kubernetes/features#27.

Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.

**Release note**:

```release-note
Promotes Source IP preservation for Virtual IPs to GA.

Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
2017-05-12 15:00:46 -07:00
Kubernetes Submit Queue
f440e190bc Merge pull request #45241 from copejon/revert-pr-45101
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)

Revert #45101 Mark PersistentVolumes as [Feature:Volumes]

**What this PR does / why we need it**:
Reverts #45101 

`Feature` tag should only be used when a test/suite has dependencies not met by core CI.  That is not the case for NFS backed PV tests.

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-05-12 15:00:41 -07:00
Kubernetes Submit Queue
7a9c2f7f01 Merge pull request #45733 from danwinship/network-test-timeout
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)

Remove a test utility function that is redundant and kinda broken

Framework.WaitForAnEndpoint() has no timeout, so if something goes wrong and the endpoint doesn't get created, the test will hang forever. (This is happening for some reason sometimes in OpenShift right now, and when the CI system eventually times out and kills the VM, it loses the logs that would explain what failed.)

There's already another nearly-identical WaitForEndpoint() method that *does* take a timeout, so people can just use that instead.

```release-note
NONE
```
2017-05-12 14:01:00 -07:00
Kubernetes Submit Queue
5cce4e8583 Merge pull request #45719 from shyamjvs/podstartuplatency
Automatic merge from submit-queue (batch tested with PRs 45653, 45719, 45729, 45730, 44250)

Print pod startup latency metric as perfdata

Follows #45657 
This should print pod startup latency in same format as api calls latencies.

cc @wojtek-t @gmarek
2017-05-12 12:12:44 -07:00
Zihong Zheng
12b6c2b879 Autogenerated files 2017-05-12 10:59:00 -07:00
Zihong Zheng
7ed716a997 Change to use ESIPP first class fields and update comments 2017-05-12 10:59:00 -07:00
Michael Taufen
cbad320205 Reorganize kubelet tree so apis can be independently versioned 2017-05-12 10:02:33 -07:00
Dan Winship
35bb7825fe Remove one slightly-broken wait-for-endpoints test util and fix another 2017-05-12 12:31:42 -04:00
Shyam Jeedigunta
48688fa70d Print pod startup latency metric as perfdata 2017-05-12 14:31:18 +02:00
Kubernetes Submit Queue
5c23dc7897 Merge pull request #45423 from jeffvance/e2e-nodeExec
Automatic merge from submit-queue

move  from daemon_restart.go to framework/util.go

**What this PR does / why we need it**:
Moves the func `nodeExec` from daemon_restart.go to framework/util.go. This is the correct file for this func and is a more intuitive pkg for other callers to use. This is a small step of the larger effort of restructuring e2e tests to be more logically structured and easier for newcomers to understand.

```release-note
NONE
```
cc @timothysc @copejon
2017-05-12 05:26:12 -07:00