kubernetes

Author	SHA1	Message	Date
Kubernetes Submit Queue	40dcbc4eb3	Merge pull request #46461 from ncdc/e2e-suite-metrics Automatic merge from submit-queue Support grabbing test suite metrics What this PR does / why we need it: Add support for grabbing metrics that cover the entire test suite's execution. Update the "interesting" controller-manager metrics to match the current names for the garbage collector, and add namespace controller metrics to the list. If you enable `--gather-suite-metrics-at-teardown`, the metrics file is written to a file with a name such as `MetricsForE2ESuite_2017-05-25T20:25:57Z.json` in the `--report-dir`. If you don't specify `--report-dir`, the metrics are written to the test log output. I'd like to enable this for some of the `pull-` CI jobs, which will require a separate PR to test-infra. Which issue this PR fixes* (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes # Special notes for your reviewer: Release note: ```release-note NONE ``` @kubernetes/sig-testing-pr-reviews @smarterclayton @wojtek-t @gmarek @derekwaynecarr @timothysc	2017-05-30 16:41:49 -07:00
Kubernetes Submit Queue	38b26db33a	Merge pull request #46613 from FengyunPan/fix-e2e-service Automatic merge from submit-queue (batch tested with PRs 45534, 37212, 46613, 46350) [e2e]Fix define redundant parameter When timeout to reach HTTP service, redundant parameter make the error is nil.	2017-05-30 04:46:04 -07:00
gmarek	0cc1999e16	Make log-monitor give up on trying to ssh to a dead node after some time	2017-05-30 10:27:10 +02:00
FengyunPan	38e8c32a26	[e2e]Fix define redundant parameter When timeout to reach HTTP service, redundant parameter make the error is nil.	2017-05-30 16:09:33 +08:00
Kubernetes Submit Queue	755d368c4a	Merge pull request #45782 from mtaufen/no-snat-test Automatic merge from submit-queue no-snat test This test checks that Pods can communicate with each other in the same cluster without SNAT. I intend to create a job that runs this in small clusters (\~3 nodes) at a low frequency (\~once per day) so that we have a signal as we work on allowing multiple non-masquerade CIDRs to be configured (see [kubernetes-incubator/ip-masq-agent](https://github.com/kubernetes-incubator/ip-masq-agent), for example). /cc @dnardo	2017-05-29 16:19:46 -07:00
Kubernetes Submit Queue	d9f3ea5191	Merge pull request #46593 from shyamjvs/fix-perfdata-subresource Automatic merge from submit-queue Fix minor bugs in setting API call metrics with subresource Based on changes from https://github.com/kubernetes/kubernetes/pull/46354 /cc @wojtek-t @smarterclayton	2017-05-29 08:45:02 -07:00
Shyam Jeedigunta	e897b21506	Fix minor bugs in setting API call metrics with subresource	2017-05-29 15:04:52 +02:00
Wojciech Tyczynski	1583912dd0	Fix panics in load test	2017-05-29 13:09:53 +02:00
Kubernetes Submit Queue	451d0a436c	Merge pull request #46509 from k82cn/add_k82cn_as_approver Automatic merge from submit-queue Added k82cn as one of scheduler approver. According to the requirement of Approver at [community-membership.md](https://github.com/kubernetes/community/blob/master/community-membership.md), I meet the requirements as follow; so I'd like to add myself as an approver of scheduler. * Reviewer of the codebase for at least 3 months [k82cn]: [~3 months](`6cc40678b6` ) * Primary reviewer for at least 10 substantial PRs to the codebase [k82cn] Reviewed [40 PRs](https://github.com/issues?q=assignee%3Ak82cn+is%3Aclosed) * Reviewed or merged at least 30 PRs to the codebase [k82cn]: 71 merged PRs in kubernetes/kubernetes, and ~100 PRs in kuberentes at https://goo.gl/j2D1fR As an approver, * I agree to only approve familiar PRs * I agree to be responsive to review/approve requests as per community expectations * I agree to continue my reviewer work as per community expectations * I agree to continue my contribution, e.g. PRs, mentor contributors	2017-05-28 22:01:32 -07:00
Kubernetes Submit Queue	1444d252e1	Merge pull request #46457 from nicksardo/gce-api-refactor Automatic merge from submit-queue (batch tested with PRs 46407, 46457) GCE - Refactor API for firewall and backend service creation What this PR does / why we need it: - Currently, firewall creation function actually instantiates the firewall object; this is inconsistent with the rest of GCE api calls. The API normally gets passed in an existing object. - Necessary information for firewall creation, (`computeHostTags`,`nodeTags`,`networkURL`,`subnetworkURL`,`region`) were private to within the package. These now have public getters. - Consumers might need to know whether the cluster is running on a cross-project network. A new `OnXPN` func will make that information available. - Backend services for regions have been added. Global ones have been renamed to specify global. - NamedPort management of instance groups has been changed from an `AddPortsToInstanceGroup` func (and missing complementary `Remove...`) to a single, simple `SetNamedPortsOfInstanceGroup` - Addressed nitpick review comments of #45524 ILB needs the regional backend services and firewall refactor. The ingress controller needs the new `OnXPN` func to decide whether to create a firewall. Release note: ```release-note NONE ```	2017-05-28 13:16:58 -07:00
Kubernetes Submit Queue	f219f3c153	Merge pull request #46558 from MrHohn/esipp-endpoint-waittime Automatic merge from submit-queue Apply KubeProxyEndpointLagTimeout to ESIPP tests Fixes #46533. The previous construction of ESIPP tests is weird, so I redo it a bit. A 30 seconds `KubeProxyEndpointLagTimeout` is introduced, as these tests ain't verifying performance, may be better to not make it too tight. /assign @thockin Release note: ```release-note NONE ```	2017-05-27 11:17:51 -07:00
Nick Sardo	9063526dfb	GCE: Refactor firewalls/backendservices api; other small changes	2017-05-27 10:25:03 -07:00
Kubernetes Submit Queue	daee6d4826	Merge pull request #45524 from MrHohn/l4-lb-healthcheck Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522) Make GCE load-balancers create health checks for nodes From #14661. Proposal on kubernetes/community#552. Fixes #46313. Bullet points: - Create nodes health check and firewall (for health checking) for non-OnlyLocal service. - Create local traffic health check and firewall (for health checking) for OnlyLocal service. - Version skew: - Don't create nodes health check if any nodes has version < 1.7.0. - Don't backfill nodes health check on existing LBs unless users explicitly trigger it. Release note: ```release-note GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default. An existing LoadBalancer will have health check attached to it when: - Change Service.Spec.Type from LoadBalancer to others and flip it back. - Any effective change on Service.Spec.ExternalTrafficPolicy. ```	2017-05-26 19:47:57 -07:00
Zihong Zheng	e332828690	Apply KubeProxyEndpointLagTimeout to ESIPP tests	2017-05-26 18:14:20 -07:00
Kubernetes Submit Queue	2b084af6dd	Merge pull request #46484 from guoyunxian/remove Automatic merge from submit-queue (batch tested with PRs 45809, 46515, 46484, 46516, 45614) Remove the reduplicated case judement This patch remove the reduplicated　case judgement	2017-05-26 16:59:04 -07:00
Michael Taufen	a653603e13	no-snat test Test checks that Pods can communicate with each other in the same cluster without SNAT.	2017-05-26 13:45:10 -07:00
Zihong Zheng	897da549bc	Autogenerated files	2017-05-26 13:19:14 -07:00
Zihong Zheng	a61cc7f477	Update firewall e2e test for LB healthcheck firewall	2017-05-26 13:18:50 -07:00
Andy Goldstein	ab76f7320a	Fix incorrect printf format	2017-05-26 11:36:52 -04:00
Andy Goldstein	41345418cb	Support grabbing test suite metrics Update the "interesting" controller-manager metrics to match the current names for the garbage collector, and add namespace controller metrics to the list.	2017-05-26 11:21:27 -04:00
Klaus Ma	68a34c1baf	Added k82cn as kube-scheduler approver.	2017-05-26 22:26:20 +08:00
guoyunxian	0bf96a3ca4	Remove the same case judement This patch remove the same case judement	2017-05-26 17:28:53 +08:00
Kubernetes Submit Queue	b8dc4915f7	Merge pull request #46423 from gmarek/fix_perf Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437) Fix performance test issues Fix #46198	2017-05-25 19:41:04 -07:00
Kubernetes Submit Queue	b9416c2c91	Merge pull request #46320 from vmware/e2evSphereStoragePolicySupport Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437) e2e tests for storage policy support in Kubernetes This PR covers e2e test cases for vSphere storage policy support in Kubernetes - #46176. The following test scenario have been implemented. - Specify only SPBM storage policy name. - Verify if the disk is provisioned on a compatible datastore with max free space. - Specify a storage policy name which is not defined on VC. - Verify if PVC create errors out that no pbm profile with this policy is found. - Specify both SPBM storage policy name and VSAN capabilities together. - Verify if PVC create errors out that you can't use both SPBM policy name with VSAN capabilities. You can only specify one. - Specify SPBM storage policy name with user specified datastore which is non-compatible. - Verify if PVC create errors out that it can't provision a disk on a non-compatible datastore. @jeffvance @divyenpatel Release note: ```release-note None ```	2017-05-25 19:41:02 -07:00
Kubernetes Submit Queue	470a6a45d5	Merge pull request #45949 from NickrenREN/kubelet-metric Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437) Unregister some metrics delete some registered metrics since they are not observed Release note: ```release-note NONE ```	2017-05-25 19:40:58 -07:00
Kubernetes Submit Queue	4a58809d88	Merge pull request #46219 from aleksandra-malinowska/stackdriver-performance-test-2 Automatic merge from submit-queue (batch tested with PRs 45269, 46219, 45966) Add overriding Stackdriver API endpoint Allow using Stackdriver test endpoint.	2017-05-25 07:21:01 -07:00
Kubernetes Submit Queue	26d7ee0447	Merge pull request #44774 from kargakis/uniquifier Automatic merge from submit-queue Switch Deployments to new hashing algo w/ collision avoidance mechanism Implements https://github.com/kubernetes/community/pull/477 @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews Fixes https://github.com/kubernetes/kubernetes/issues/29735 Fixes https://github.com/kubernetes/kubernetes/issues/43948 ```release-note Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore. ```	2017-05-25 06:09:58 -07:00
Michail Kargakis	9190a47c37	Generated changes for collision count Signed-off-by: Michail Kargakis <mkargaki@redhat.com>	2017-05-25 12:23:17 +02:00
Kubernetes Submit Queue	9c1480bb61	Merge pull request #46366 from nicksardo/gce-subnetwork-url Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366) GCE - Retrieve subnetwork name/url from gce.conf What this PR does / why we need it: Features like ILB require specifying the subnetwork if the network is type manual. Notes: The network URL can be [constructed](`68e7e18698/pkg/cloudprovider/providers/gce/gce.go (L211-L217)`) by fetching instance metadata; however, the subnetwork is not provided through this feature. Users must specify the subnetwork name/url through the gce.conf. Although multiple subnets can exist in the same region for a network, the cloud provider will only use one subnet url for creating LBs. Release note: ```release-note NONE ```	2017-05-25 03:14:05 -07:00
Kubernetes Submit Queue	23348ceedc	Merge pull request #46354 from smarterclayton/metrics_subresource Automatic merge from submit-queue (batch tested with PRs 45573, 46354, 46376, 46162, 46366) Subresources are not included in apiserver prometheus metrics Subresources are very often completely different code paths and errors generated on those code paths are important to distinguish. @kubernetes/sig-api-machinery-pr-reviews ```release-note The Prometheus metrics for the kube-apiserver for tracking incoming API requests and latencies now return the `subresource` label for correctly attributing the type of API call. ```	2017-05-25 03:13:59 -07:00
gmarek	02951f182e	Correctly handle nil resource usage in performance e2e tests	2017-05-25 11:44:03 +02:00
gmarek	ded8e03fc3	Reduce service creation/deletion parallelism in the load test	2017-05-25 11:44:03 +02:00
Michail Kargakis	4a2c5eae92	Implement hash collision avoidance mechanism Signed-off-by: Michail Kargakis <mkargaki@redhat.com>	2017-05-25 11:17:45 +02:00
Kubernetes Submit Queue	d84f3f4b7e	Merge pull request #46363 from MrHohn/fix-CheckPodsCondition Automatic merge from submit-queue (batch tested with PRs 45913, 46065, 46352, 46363, 46373) Fix CheckPodsCondition to print out the correct podName From a couple CIs (https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-serial/1114, https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-gci-qa-serial-master/2246, https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke-pre-release/2187), all indicate we print out the wrong pod name in CheckPodsCondition for _"Pod XXX failed to be running and ready, or succeeded."_: ``` I0524 02:09:50.173] May 24 02:09:50.173: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m55.033881993s elapsed) I0524 02:09:52.178] May 24 02:09:52.178: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m57.03848264s elapsed) I0524 02:09:54.183] May 24 02:09:54.182: INFO: Waiting for pod heapster-v1.3.0-3806988011-kzkg6 in namespace 'kube-system' status to be 'running and ready, or succeeded'(found phase: "Running", readiness: false) (4m59.043463323s elapsed) I0524 02:09:56.183] May 24 02:09:56.183: INFO: Pod fluentd-gcp-v2.0-6wf67 failed to be running and ready, or succeeded. I0524 02:09:56.184] May 24 02:09:56.183: INFO: Wanted all 23 pods to be running and ready, or succeeded. Result: false. Pods: [heapster-v1.3.0-3806988011-kzkg6 kube-proxy-bootstrap-e2e-minion-group-bbwn rescheduler-v0.3.0-bootstrap-e2e-master monitoring-influxdb-grafana-v4-1q59k l7-default-backend-1044750973-zgxsc etcd-server-events-bootstrap-e2e-master kube-apiserver-bootstrap-e2e-master kube-proxy-bootstrap-e2e-minion-group-6nqb kube-proxy-bootstrap-e2e-minion-group-mzbz fluentd-gcp-v2.0-chd2x kube-dns-806549836-f8p46 fluentd-gcp-v2.0-44x97 kube-dns-autoscaler-2528518105-vlg8t fluentd-gcp-v2.0-p1h4b kube-controller-manager-bootstrap-e2e-master l7-lb-controller-v0.9.3-bootstrap-e2e-master kubernetes-dashboard-2917854236-tn3nx kube-dns-806549836-fq2fp kube-scheduler-bootstrap-e2e-master etcd-empty-dir-cleanup-bootstrap-e2e-master kube-addon-manager-bootstrap-e2e-master etcd-server-bootstrap-e2e-master fluentd-gcp-v2.0-6wf67] I0524 02:09:56.184] May 24 02:09:56.183: INFO: At least one pod wasn't running and ready or succeeded at test start. I0524 02:09:56.184] [AfterEach] [k8s.io] Restart [Disruptive] ``` Check the codes and found we always print out the last pod name, which is random. Pass the pod name into channel to fix. Release note: ```release-note NONE ```	2017-05-25 00:11:05 -07:00
System Administrator	9c8e92b8ff	e2e tests for storage policy support in Kubernetes	2017-05-24 16:39:00 -07:00
Clayton Coleman	ad431c454c	Subresources are not included in apiserver prometheus metrics Subresources are very often completely different code paths and errors generated on those code paths are important to distinguish.	2017-05-24 16:23:50 -04:00
Nick Sardo	e7ee3913d7	Add subnetworkUrl param to e2e	2017-05-24 10:54:51 -07:00
Zihong Zheng	03d08623e8	Fix CheckPodsCondition to print out the correct podName	2017-05-24 10:20:57 -07:00
Kubernetes Submit Queue	dae6955555	Merge pull request #46293 from nicksardo/chaosmonkey-defer-stop Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194) Chaosmonkey - Signal stop to tests and wait for done when disruption fails What this PR does / why we need it: Prevents tests from leaking resources because their Teardown was never called when test disruption fails. Which issue this PR fixes First problem of #45842 Release note: ```release-note NONE ```	2017-05-23 15:48:59 -07:00
Kubernetes Submit Queue	1e2105808b	Merge pull request #45136 from vishh/cos-nvidia-driver-install Automatic merge from submit-queue Enable "kick the tires" support for Nvidia GPUs in COS This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS). User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host - * `/home/kubernetes/bin/nvidia/lib` for libraries * `/home/kubernetes/bin/nvidia/bin` for debug utilities Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes. Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable. This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons: 1. Driver installation requires disabling a kernel feature in COS. 2. The kernel API for disabling this interface changed across COS versions 3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR. 4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break. Try out this PR 1. Get Quota for GPUs in any region 2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci` 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh` 4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml` 5. Run your CUDA app in a pod. Another option is to run a e2e manually to try out this PR 1. Get Quota for GPUs in any region 2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci 3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"` 4. `go run hack/e2e.go -- --up` 5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"` The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration. TODO: - [x] Update COS image version to m59 release. - [x] Remove sleep from the install script and add it to the daemonset - [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters. - [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759 - [x] Update node e2e serial configs to install nvidia drivers on COS by default	2017-05-23 10:46:10 -07:00
Nick Sardo	f40f45abc1	Defer test stop & cleanup	2017-05-23 10:11:46 -07:00
Anirudh	63e51dc66e	PDB MaxUnavailable: e2e tests	2017-05-23 07:18:44 -07:00
Kubernetes Submit Queue	c2c5051adf	Merge pull request #44899 from smarterclayton/burst Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663) Support parallel scaling on StatefulSets Fixes #41255 ```release-note StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`. The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed. Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set. ```	2017-05-22 19:07:09 -07:00
Kubernetes Submit Queue	0329e3fdaf	Merge pull request #46211 from gmarek/panic Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910) Add more logs to kubelet_stats Ref. #46198	2017-05-22 15:50:00 -07:00
Mik Vyatskov	f605040165	Make Stackdriver Logging e2e tests less restrictive	2017-05-22 18:14:20 +02:00
gmarek	38981e9fd4	Add more logs to kubelet_stats	2017-05-22 15:49:57 +02:00
Aleksandra Malinowska	0e5051a84c	Add overriding Stackdriver API endpoint	2017-05-22 15:47:39 +02:00
Clayton Coleman	e40648de68	E2E test for statefulset burst	2017-05-21 01:14:31 -04:00
Vishnu kannan	1e77594958	Adding an installer script that installs Nvidia drivers in Container Optimized OS Packaged the script as a docker container stored in gcr.io/google-containers A daemonset deployment is included to make it easy to consume the installer A cluster e2e has been added to test the installation daemonset along with verifying installation by using a sample CUDA application. Node e2e for GPUs updated to avoid running on nodes without GPU devices. Signed-off-by: Vishnu kannan <vishnuk@google.com>	2017-05-20 21:17:19 -07:00
Kubernetes Submit Queue	112ed869c7	Merge pull request #46053 from dashpole/test_eviction_metrics Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981) Log age of stats used for evictions during eviction tests I recently added prometheus metrics for the age of the metrics used for evictions #43031. It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions. This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement. Feel free to take a look whenever either of you has time. /assign @mtaufen /assign @Random-Liu	2017-05-19 23:29:28 -07:00

1 2 3 4 5 ...

4984 Commits