Commit Graph

8572 Commits

Author SHA1 Message Date
David Ashpole
1a6572fc6c summary test now tests a pod that has containers that have restarted 2017-05-24 13:27:57 -07:00
Clayton Coleman
ad431c454c Subresources are not included in apiserver prometheus metrics
Subresources are very often completely different code paths and errors
generated on those code paths are important to distinguish.
2017-05-24 16:23:50 -04:00
Nick Sardo
e7ee3913d7 Add subnetworkUrl param to e2e 2017-05-24 10:54:51 -07:00
Zihong Zheng
03d08623e8 Fix CheckPodsCondition to print out the correct podName 2017-05-24 10:20:57 -07:00
jayunit100
41d7655e11 Add ownership for the future of scheduler_perf and kubemark 2017-05-24 10:17:50 -04:00
Kubernetes Submit Queue
d4ff0f2a0e Merge pull request #46312 from dashpole/remove_memcg_jenkins_properties
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)

Remove unused test properties

Issue:  #42676
A separate serial memcg suite was created for the initial stages of re-enabling memcg notifications.  Now that all e2e tests have memcg notifications enabled, this suite is no longer needed.
2017-05-23 19:43:07 -07:00
Kubernetes Submit Queue
dae6955555 Merge pull request #46293 from nicksardo/chaosmonkey-defer-stop
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)

Chaosmonkey - Signal stop to tests and wait for done when disruption fails

**What this PR does / why we need it**:
Prevents tests from leaking resources because their Teardown was never called when test disruption fails.   

**Which issue this PR fixes**
First problem of #45842 

**Release note**:
```release-note
NONE
```
2017-05-23 15:48:59 -07:00
Kubernetes Submit Queue
45b275d52c Merge pull request #45897 from ncdc/gc-require-list-watch
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)

GC: update required verbs for deletable resources, allow list of ignored resources to be customized

The garbage collector controller currently needs to list, watch, get,
patch, update, and delete resources. Update the criteria for
deletable resources to reflect this.

Also allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.

cc @caesarxuchao @deads2k @smarterclayton @mfojtik @liggitt @sttts @kubernetes/sig-api-machinery-pr-reviews
2017-05-23 15:48:57 -07:00
Random-Liu
82f588b483 Fix cos image project to cos-cloud. 2017-05-23 15:12:03 -07:00
David Ashpole
8341d544f3 remove unused test properties 2017-05-23 14:39:18 -07:00
David Ashpole
20eb016597 dont attach a GPU to ubuntu machines 2017-05-23 14:34:18 -07:00
Random-Liu
dc023144a3 Move docker validation test to separate project. 2017-05-23 14:07:15 -07:00
Kubernetes Submit Queue
1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install
Automatic merge from submit-queue

Enable "kick the tires" support for Nvidia GPUs in COS

This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
*  `/home/kubernetes/bin/nvidia/bin` for debug utilities

Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.

Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.

This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS. 
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.

**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.

**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up` 
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.

TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
2017-05-23 10:46:10 -07:00
Kubernetes Submit Queue
1602e2a338 Merge pull request #45587 from foxish/pdb-maxunavailab
Automatic merge from submit-queue (batch tested with PRs 45587, 46286)

PDB Max Unavailable Field

Completes https://github.com/kubernetes/features/issues/285

```release-note
Adds a MaxUnavailable field to PodDisruptionBudget
```


Individual commits are self-contained; Last commit can be ignored because it is autogenerated code.
cc @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews
2017-05-23 10:29:56 -07:00
Nick Sardo
f40f45abc1 Defer test stop & cleanup 2017-05-23 10:11:46 -07:00
Andy Goldstein
d1a0384678 GC: allow ignored resources to be customized
Allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.
2017-05-23 12:05:09 -04:00
Kubernetes Submit Queue
8e07e61a43 Merge pull request #46223 from smarterclayton/scheduler_max
Automatic merge from submit-queue (batch tested with PRs 45766, 46223)

Scheduler should use a shared informer, and fix broken watch behavior for cached watches

Can be used either from a true shared informer or a local shared
informer created just for the scheduler.

Fixes a bug in the cache watcher where we were returning the "current" object from a watch event, not the historic event.  This means that we broke behavior when introducing the watch cache.  This may have API implications for filtering watch consumers - but on the other hand, it prevents clients filtering from seeing objects outside of their watch correctly, which can lead to other subtle bugs.

```release-note
The behavior of some watch calls to the server when filtering on fields was incorrect.  If watching objects with a filter, when an update was made that no longer matched the filter a DELETE event was correctly sent.  However, the object that was returned by that delete was not the (correct) version before the update, but instead, the newer version.  That meant the new object was not matched by the filter.  This was a regression from behavior between cached watches on the server side and uncached watches, and thus broke downstream API clients.
```
2017-05-23 07:42:00 -07:00
Anirudh
63e51dc66e PDB MaxUnavailable: e2e tests 2017-05-23 07:18:44 -07:00
Kubernetes Submit Queue
cc6e51c6e8 Merge pull request #45427 from ncdc/gc-shared-informers
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)

Use shared informers in gc controller if possible

Modify the garbage collector controller to try to use shared informers for resources, if possible, to reduce the number of unique reflectors listing and watching the same thing.

cc @kubernetes/sig-api-machinery-pr-reviews @caesarxuchao @deads2k @liggitt @sttts @smarterclayton @timothysc @soltysh @kargakis @kubernetes/rh-cluster-infra @derekwaynecarr @wojtek-t @gmarek
2017-05-22 20:58:03 -07:00
Kubernetes Submit Queue
bb56937b92 Merge pull request #46055 from deads2k/crd-01-embed
Automatic merge from submit-queue (batch tested with PRs 46022, 46055, 45308, 46209, 43590)

embed kube-apiextensions inside of kube-apiserver

To reduce operation complexity, we decided to include the kube-apiextensions-server inside of kube-apiserver (https://github.com/kubernetes/community/blob/master/sig-api-machinery/api-extensions-position-statement.md#q-should-kube-aggregator-be-a-separate-binaryprocess-than-kube-apiserver).  With the API reasonably well established and a finalizer about merge, I think its time to add ourselves.

This pull wires kube-apiextensions-server ahead of the TPRs so that one will replace the other if both are added by accident (CRDs should have priority) and wires a controller for automatic aggregation.

WIP because I still need tests: unit test for controller, test-cmd test to mirror the TPR test.


```release-note
Adds the `CustomResourceDefinition` (crd) types to the `kube-apiserver`.  These are the successors to `ThirdPartyResource`.  See https://github.com/kubernetes/community/blob/master/contributors/design-proposals/thirdpartyresources.md for more details.
```
2017-05-22 19:59:57 -07:00
Kubernetes Submit Queue
c2c5051adf Merge pull request #44899 from smarterclayton/burst
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)

Support parallel scaling on StatefulSets

Fixes #41255

```release-note
StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`.  The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed.  Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set.
```
2017-05-22 19:07:09 -07:00
Kubernetes Submit Queue
a572f10387 Merge pull request #46205 from billy2180/bump-network-tester-json-image-version-to-1.9
Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910)

test/images/network-tester:bump rc/pod image version to 1.9

Current image version is 1.9,update the image version of the associated json file to 1.9
```release-note
NONE
```
2017-05-22 15:50:05 -07:00
Kubernetes Submit Queue
03ba1324cf Merge pull request #46224 from gmarek/kubemark_heapster
Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910)

Make CPU request for heapster in kubemark scale with the number of Nodes
2017-05-22 15:50:03 -07:00
Kubernetes Submit Queue
0329e3fdaf Merge pull request #46211 from gmarek/panic
Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910)

Add more logs to kubelet_stats

Ref. #46198
2017-05-22 15:50:00 -07:00
Michelle Au
1a280993a9 Local persistent volume basic e2e 2017-05-22 14:46:03 -07:00
Clayton Coleman
8cd95c78c4 Scheduler should use a shared informer
Can be used either from a true shared informer or a local shared
informer created just for the scheduler.
2017-05-22 13:50:14 -04:00
Monis Khan
cbfe566e49 Detect cohabitating resources in etcd storage test
This change updates the etcd storage path test to detect cohabitating
resources by looking at their expected location in etcd.  This was not
detected in the past because the GVK check did not span across groups.

To limit noise from failures caused by multiple objects at the same
location in etcd, the test now fails when different GVRs share the same
expected path.  Thus every object is expected to have a unique path.

Signed-off-by: Monis Khan <mkhan@redhat.com>
2017-05-22 13:48:18 -04:00
Andy Goldstein
2480f2ceb6 Use shared informers in gc controller if possible 2017-05-22 12:51:37 -04:00
Mik Vyatskov
f605040165 Make Stackdriver Logging e2e tests less restrictive 2017-05-22 18:14:20 +02:00
Kubernetes Submit Queue
b00c1b66f4 Merge pull request #46164 from shyamjvs/master-log-kubemark
Automatic merge from submit-queue

Add script to dump kubemark master logs

First step towards solving the issue https://github.com/kubernetes/kubernetes/issues/46109.

cc @kubernetes/test-infra-maintainers  @wojtek-t @gmarek
2017-05-22 09:01:36 -07:00
gmarek
27fc7be396 Make CPU request for heapster in kubemark scale with the number of Nodes 2017-05-22 16:20:27 +02:00
FengyunPan
287f703d3a Close file after os.Open() 2017-05-22 21:51:11 +08:00
gmarek
38981e9fd4 Add more logs to kubelet_stats 2017-05-22 15:49:57 +02:00
Aleksandra Malinowska
0e5051a84c Add overriding Stackdriver API endpoint 2017-05-22 15:47:39 +02:00
deads2k
446e959bf7 make CRD apiservice controller 2017-05-22 08:54:14 -04:00
billy2180
952ad3f4a7 test/images/network-tester:bump rc/pod image verison to 1.9 2017-05-22 17:11:23 +08:00
Wojciech Tyczynski
8de8446840 Revert "Scheduler should use shared informer for pods"
This reverts commit 479f01d340.
2017-05-22 09:03:35 +02:00
Kubernetes Submit Queue
06c12e717a Merge pull request #46071 from emaildanwilson/fedClusterSelectorIntegration
Automatic merge from submit-queue

[Federation] ClusterSelector Integration Testing

This pull request adds integration testing for the federated ClusterSelector ref: design #29887 merged pull #40234

cc: @nikhiljindal @marun
2017-05-21 23:18:44 -07:00
Kubernetes Submit Queue
ba4c8b8db2 Merge pull request #46025 from billy2180/bump-netexec-pod-xml-to-1.7
Automatic merge from submit-queue

Bump e2e netexec pod.xml image version to 1.7

Changing the image version from 1.5 to 1.7
2017-05-21 02:27:05 -07:00
Clayton Coleman
e40648de68 E2E test for statefulset burst 2017-05-21 01:14:31 -04:00
Vishnu kannan
86b5edb79a Update COS version to m59
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Vishnu kannan
1e77594958 Adding an installer script that installs Nvidia drivers in Container Optimized OS
Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Clayton Coleman
479f01d340 Scheduler should use shared informer for pods
Previously, the scheduler created two separate list watchers. This
changes the scheduler to be able to leverage a shared informer, whether
passed in externally or spawned using the new in place method. This
removes the last use of a "special" informer in the codebase.

Allows someone wrapping the scheduler to use a shared informer if they
have more information avaliable.
2017-05-20 14:19:49 -04:00
Clayton Coleman
784e3ae5fa Switch the tokens controller to use shared informers
Tokens controller previously needed a bit of extra help in order to be
safe for concurrent use. The new MutationCache allows it to keep a local
cache and still use a shared informer. The filtering event handler lets
it only see changes to secrets it cares about.
2017-05-20 14:19:49 -04:00
Shyam Jeedigunta
360054a75f Add script to dump kubemark master logs 2017-05-20 13:12:38 +02:00
Kubernetes Submit Queue
03495f12c3 Merge pull request #46152 from shashidharatd/federation
Automatic merge from submit-queue (batch tested with PRs 46014, 46152)

Updated test/test_owners.csv for federation test cases

To the best of my knowledge have updated the test owners for federation e2e test cases. PTAL and comment if any concern.

**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-federation-pr-reviews @fejta 
/assign @madhusudancs
2017-05-20 00:59:27 -07:00
Kubernetes Submit Queue
8fe818b2a1 Merge pull request #45981 from fabianofranz/kubectl_plugins_v1_part1
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)

Command tree and exported env in kubectl plugins

This is part of `kubectl` plugins V1:
- Adds support to several env vars passing context information to the plugin. Plugins can make use of them to connect to the REST API, access global flags, get the path of the plugin caller (so that `kubectl` can be invoked) and so on. Exported env vars include
  - `KUBECTL_PLUGINS_DESCRIPTOR_*`: the plugin descriptor fields
  - `KUBECTL_PLUGINS_GLOBAL_FLAG_*`: one for each global flag, useful to access namespace, context, etc
  - ~`KUBECTL_PLUGINS_REST_CLIENT_CONFIG_*`: one for most fields in `rest.Config` so that a REST client can be built.~
  - `KUBECTL_PLUGINS_CALLER`: path to `kubectl`
  - `KUBECTL_PLUGINS_CURRENT_NAMESPACE`: namespace in use
- Adds support for plugins as child of other plugins so that a tree of commands can be built (e.g. `kubectl myplugin list`, `kubectl myplugin add`, etc)

**Release note**:

```release-note
Added support to a hierarchy of kubectl plugins (a tree of plugins as children of other plugins).

Added exported env vars to kubectl plugins so that plugin developers have access to global flags, namespace, the plugin descriptor and the full path to the caller binary.
```
@kubernetes/sig-cli-pr-reviews
2017-05-19 23:29:32 -07:00
Kubernetes Submit Queue
112ed869c7 Merge pull request #46053 from dashpole/test_eviction_metrics
Automatic merge from submit-queue (batch tested with PRs 46033, 46122, 46053, 46018, 45981)

Log age of stats used for evictions during eviction tests

I recently added prometheus metrics for the age of the metrics used for evictions #43031.  It would be nice to surface these during eviction tests, so I can better assess how old stats are, and whether or not the age of stats causes extra evictions.

This isnt super-high priority, and can be done after code-freeze, since it is a testing improvement.  Feel free to take a look whenever either of you has time.

/assign @mtaufen 
/assign @Random-Liu
2017-05-19 23:29:28 -07:00
Kubernetes Submit Queue
4f55f49035 Merge pull request #46042 from derekwaynecarr/quota-admission-registry
Automatic merge from submit-queue (batch tested with PRs 45346, 45903, 45958, 46042, 45975)

ResourceQuota admission control injects registry

**What this PR does / why we need it**:
The `ResourceQuota` admission controller works with a registry that maps a GroupKind to an Evaluator.  The registry used in the existing plug-in is not injectable, which makes usage of the ResourceQuota plug-in in other API server contexts difficult.  This PR updates the code to support late injection of the registry via a plug-in initializer.
2017-05-19 22:29:34 -07:00
shashidharatd
b4792014d1 Updated test/test_owners.csv for federation test cases 2017-05-20 08:43:30 +05:30