Commit Graph

6789 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
586fd3374f Merge pull request #43090 from foxish/fix-network-partition-flake
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)

Add a timeout to allow replacement pod to become ready

Hopefully fixes https://github.com/kubernetes/kubernetes/issues/37259

```
I0314 04:26:02.562] Mar 14 04:26:02.562: INFO: Pod my-hostname-net-1bgrj still exists
I0314 04:26:22.491] Mar 14 04:26:22.491: INFO: Waiting for pod my-hostname-net-1bgrj to disappear
I0314 04:26:22.496] Mar 14 04:26:22.495: INFO: Pod my-hostname-net-1bgrj no longer exists
I0314 04:26:22.496] STEP: verifying whether the pod from the unreachable node is recreated
I0314 04:26:22.498] Mar 14 04:26:22.498: INFO: Pod name my-hostname-net: Found 3 pods out of 3
I0314 04:26:22.499] STEP: ensuring each pod is running
I0314 04:26:22.499] STEP: trying to dial each unique pod
I0314 04:26:22.579] Mar 14 04:26:22.579: INFO: Controller my-hostname-net: Got expected result from replica 1 [my-hostname-net-5jrdb]: "my-hostname-net-5jrdb", 1 of 3 required successes so far
I0314 04:26:22.642] Mar 14 04:26:22.642: INFO: Controller my-hostname-net: Got expected result from replica 2 [my-hostname-net-mjf3c]: "my-hostname-net-mjf3c", 2 of 3 required successes so far
I0314 04:31:22.645] Mar 14 04:31:22.644: INFO: Controller my-hostname-net: Failed to Get from replica 3 [my-hostname-net-rf46s]: Get https://35.184.87.178/api/v1/namespaces/e2e-tests-network-partition-s5gqt/pods/my-hostname-net-rf46s/proxy/: context deadline exceeded
```

The issue appears to be that we have a race between the pod being "running + ready" and being accessible via the APIServer proxy.


cc @kow3ns @bowei @davidopp
2017-03-14 18:44:22 -07:00
Kubernetes Submit Queue
8f9cba87a9 Merge pull request #43105 from intelsdi-x/reuse-sched-event-predicates
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)

Move e2e sched event predicates to new file.

**What this PR does / why we need it**:

Small e2e test refactor for scheduler. Moves scheduler event predicates out of opaque_resource.go for reuse elsewhere.

**Release note**:

```release-note
NONE
```

cc @kubernetes/sig-scheduling-pr-reviews @timothysc @bsalamat
2017-03-14 18:44:20 -07:00
Anirudh
7698196fa5 Add a timeout to allow replacement pod to become ready 2017-03-14 17:09:39 -07:00
Kubernetes Submit Queue
1221779b18 Merge pull request #43018 from nicksardo/ingress-upgrade-cleanup-flake
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)

Log instead of fail on GLBCs tendency to leak resources

**What this PR does / why we need it**:
Stops upgrade tests from flaking because the GLBC does not cleanup all resources due to a race condition.

**Which issue this PR fixes**: fixes #38569

**Special notes for your reviewer**:
To be reviewed by @mml 

```release-note
NONE
```
2017-03-14 15:59:18 -07:00
Connor Doyle
4f847cb440 Move e2e sched event predicates to new file. 2017-03-14 15:20:27 -07:00
Kubernetes Submit Queue
442e920085 Merge pull request #43029 from janetkuo/deployment-controllerRef-test
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)

Add e2e test for Deployment controllerRef orphaning and adoption

Follow up #42908 

@enisoc @kubernetes/sig-apps-bugs @kargakis
2017-03-14 13:52:46 -07:00
Kubernetes Submit Queue
42cdb052b6 Merge pull request #42968 from timothysc/sched_e2e_breakout
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)

Initial breakout of scheduling e2es to help assist in assignment and refactoring

**What this PR does / why we need it**:
This PR segregates the scheduling specific e2es to isolate the library which will assist both in refactoring but also auto-assignment of issues.  

**Which issue this PR fixes** 
xref: https://github.com/kubernetes/kubernetes/issues/42691#issuecomment-285563265

**Special notes for your reviewer**:
All this change does is shuffle code around and quarantine.  Behavioral, and other cleanup changes, will be in follow on PRs.  As of today, the e2es are a monolith and there is massive symbol pollution, this 1st step allows us to segregate the e2es and tease apart the dependency mess. 

**Release note**:

```
NONE
```

/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-testing-pr-reviews @marun @skriss 

/cc @gmarek - same trick for load + density, etc.
2017-03-14 13:52:43 -07:00
Kubernetes Submit Queue
0ea3e9a2c1 Merge pull request #43066 from foxish/fix-statefulset-apps
Automatic merge from submit-queue (batch tested with PRs 43034, 43066)

Fix StatefulSet apps e2e tests

Fixes https://github.com/kubernetes/kubernetes/issues/42490

```release-note
NONE
```

cc @kubernetes/sig-apps-bugs
2017-03-14 11:44:39 -07:00
Kubernetes Submit Queue
dc2b0ee2cf Merge pull request #43034 from enisoc/statefulset-patch
Automatic merge from submit-queue (batch tested with PRs 43034, 43066)

Allow StatefulSet controller to PATCH Pods.

**What this PR does / why we need it**:

StatefulSet now needs the PATCH permission on Pods since it calls into ControllerRefManager to adopt and release. This adds the permission and the missing e2e test that should have caught this.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

This is based on #42925.

**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
2017-03-14 11:44:37 -07:00
Kubernetes Submit Queue
f53ba5581b Merge pull request #43080 from foxish/foxish-patch-2
Automatic merge from submit-queue

Add rest of workloads team to test/OWNERS

```release-note
NONE
```

cc @kubernetes/sig-apps-misc
2017-03-14 10:19:39 -07:00
Kubernetes Submit Queue
6de28fab7d Merge pull request #42942 from vishh/gpu-cont-fix
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
2017-03-14 10:19:17 -07:00
Anthony Yeh
53a6f4402f Allow StatefulSet controller to PATCH Pods.
Also add an e2e test that should have caught this.
2017-03-14 09:27:33 -07:00
Anirudh Ramanathan
5267f05be7 Add people to test/OWNERS 2017-03-14 08:52:08 -07:00
Anirudh
bcc73dbe1a Fix StatefulSet apps flakes 2017-03-14 02:44:55 -07:00
Timothy St. Clair
6cc40678b6 Initial breakout of scheduling e2es to help assist in both assignment
and refactoring.
2017-03-13 22:34:57 -05:00
Janet Kuo
c97935533a Add e2e test for Deployment controllerRef orphaning and adoption 2017-03-13 18:43:09 -07:00
Nick Sardo
3e85c0f758 Log instead of fail on GLBCs tendency to leak resources 2017-03-13 15:31:03 -07:00
Kubernetes Submit Queue
5913c5a453 Merge pull request #42925 from janetkuo/ds-adopt-e2e
Automatic merge from submit-queue

Allow DaemonSet controller to PATCH pods, and add more steps and logs in DaemonSet pods adoption e2e test

DaemonSet pods adoption failed because DS controller aren't allowed to patch pods when claiming pods. 

[Edit] This PR fixes #42908 by modifying RBAC to allow DaemonSet controllers to patch pods, as well as adding more logs and steps to the original e2e test to make debugging easier. 

Tested locally with a local cluster and GCE cluster. 
@kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
2017-03-13 14:06:03 -07:00
Kubernetes Submit Queue
19574a10f2 Merge pull request #42906 from intelsdi-x/reuse-observer-helpers
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)

Move node and event observer helpers to e2e/common

**What this PR does / why we need it**:

Moves existing test helper functions in OIR e2e tests to `test/e2e/common`. These functions wrap informers to help test writers to observe events instead of long-polling for status updates.

For usage examples, see `test/e2e/opaque_resource.go`.

cc @kubernetes/sig-scheduling-misc

**Release note**:
```release-note
NONE
```
2017-03-13 13:22:12 -07:00
Kubernetes Submit Queue
d60d965f33 Merge pull request #42940 from caesarxuchao/fix-gc-orphan-rs
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)

Increase timeout for the orphan e2e test

Fix #42086.

Analysis of test logs are in https://github.com/kubernetes/kubernetes/issues/42086#issuecomment-285770868 and the following comments.

@deads2k PTAL, thanks!
2017-03-13 13:22:10 -07:00
Janet Kuo
287b962860 Add more steps and logs in DaemonSet pods adoption e2e test 2017-03-13 11:37:17 -07:00
Vishnu Kannan
8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Kubernetes Submit Queue
ab9b299c30 Merge pull request #42915 from kubernetes/fabianofranz-test-approver
Automatic merge from submit-queue

Add fabianofranz as approver for test/e2e/kubectl.go

Adding myself as approver for `kubectl` end-to-end tests.

```release-note
NONE
```
2017-03-13 07:39:29 -07:00
Connor Doyle
ba9410621f Move node and event observer helpers to e2e/common 2017-03-12 19:35:26 -07:00
Kubernetes Submit Queue
81ba4741f3 Merge pull request #42901 from fabianofranz/issues_42697
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)

Fixes kubectl skew test failure when using kubectl.sh

Fixes leftovers from https://github.com/kubernetes/kubernetes/pull/42737.

**Release note**:

```release-note
NONE
```
2017-03-10 22:02:20 -08:00
Kubernetes Submit Queue
8cb14a4f7f Merge pull request #42755 from aveshagarwal/master-fix-default-toleration-seconds
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)

Fix DefaultTolerationSeconds admission plugin

DefaultTolerationSeconds is not working as expected. It is supposed to add default tolerations (for unreachable and notready conditions). but no pod was getting these toleration. And api server was throwing this error:

```
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.769212   32070 admission.go:71] expected pod but got Pod
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.789055   32070 admission.go:71] expected pod but got Pod
Mar 08 13:44:02 fedora25 hyperkube[32070]: E0308 13:44:02.006784   32070 admission.go:71] expected pod but got Pod
Mar 08 13:45:39 fedora25 hyperkube[32070]: E0308 13:45:39.754669   32070 admission.go:71] expected pod but got Pod
Mar 08 14:48:16 fedora25 hyperkube[32070]: E0308 14:48:16.673181   32070 admission.go:71] expected pod but got Pod
```

The reason for this error is that the input to admission plugins is internal api objects not versioned objects so expecting versioned object is incorrect. Due to this, no pod got desired tolerations and it always showed:

```
Tolerations: <none>
```

After this fix, the correct  tolerations are being assigned to pods as follows:

```
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
```

@davidopp @kevin-wangzefeng @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scheduling-bugs @derekwaynecarr 

Fixes https://github.com/kubernetes/kubernetes/issues/42716
2017-03-10 22:02:18 -08:00
Kubernetes Submit Queue
ca09352dd9 Merge pull request #42349 from timstclair/aa-upgrade
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)

AppArmor cluster upgrade test

Add a cluster upgrade test for AppArmor. I still need to test this (having some trouble with the cluster-upgrade tests), but wanted to start the review process.

/cc @dchen1107 @roberthbailey
2017-03-10 22:02:16 -08:00
Kubernetes Submit Queue
328e555f72 Merge pull request #41794 from shashidharatd/federation-upgrade-tests-1
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)

[Federation][e2e] Add framework for upgrade test in federation

Adding framework for federation upgrade tests. please refer to #41791

cc @madhusudancs @nikhiljindal @kubernetes/sig-federation-pr-reviews
2017-03-10 22:02:15 -08:00
Chao Xu
a3f4053cb3 increase timeout for orphan e2e test 2017-03-10 18:13:48 -08:00
Christian Bell
9a37fe6dff [Federation] Deployments unaware of ReadyReplicas
The Deployment controller was not propagating ReadyReplicas to underlying clusters causing these errors:
```
Error syncing cluster controller: Deployment.apps "federation-deployment" is invalid: status.availableReplicas: Invalid value: 5: cannot be greater than readyReplicas
```

This was caught in e2e testing and is a 1.6 regression for support that was added in #37959. Without this fix, users will be unable to scale up their deployments.
2017-03-10 15:00:02 -08:00
Fabiano Franz
224ee822d4 Add fabianofranz as approver for test/e2e/kubectl.go 2017-03-10 18:13:43 -03:00
Kubernetes Submit Queue
e2218290cf Merge pull request #42444 from jingxu97/Mar/deleteVolume
Automatic merge from submit-queue (batch tested with PRs 42608, 42444)

Return nil when deleting non-exist GCE PD

When gce cloud tries to delete a disk, if the disk could not be found
from the zones, the function should return nil error. This modified behavior is also consistent with AWS
2017-03-10 12:50:24 -08:00
shashidharatd
a14f8dc346 auto generated bazel build files 2017-03-11 01:39:56 +05:30
shashidharatd
662f0ef531 Add framework for federation upgrade tests 2017-03-11 01:39:56 +05:30
shashidharatd
4443a1b40d Move few reusable functions to upgrade_utils.go 2017-03-11 01:39:56 +05:30
Fabiano Franz
adea540a5b Fixes kubectl skew test failure when using kubectl.sh 2017-03-10 15:25:38 -03:00
Kubernetes Submit Queue
f71492a9ac Merge pull request #42719 from gmarek/taint-test
Automatic merge from submit-queue (batch tested with PRs 36704, 42719)

Extend timeouts in taints test to account for slow Pod deletions

Fix #42685

Before merging this we need a consensus on what to do with slow Pod deletions.
2017-03-10 09:06:23 -08:00
Kubernetes Submit Queue
4ff0af821a Merge pull request #42879 from jsafrane/test-pod-logs
Automatic merge from submit-queue

e2e test: Log container output on TestContainerOutput error

When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.

This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.

```release-note
NONE
```
2017-03-10 06:13:44 -08:00
gmarek
4e5b4e7ee0 Extend timeouts in taints test to account for slow Pod deletions 2017-03-10 14:23:47 +01:00
Jan Safranek
bc06c636d1 e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
2017-03-10 10:08:57 +01:00
Avesh Agarwal
9f533de80d Fix DefaultTolerationSeconds admission plugin. It was using
versioned object whereas admission plugins operate on internal objects.
2017-03-09 20:24:43 -05:00
Random-Liu
f81460e35d Change the junit file name format to junit_image-name_id.xml,
and make the gci image name shorter.
2017-03-09 16:47:48 -08:00
Kubernetes Submit Queue
4540674b04 Merge pull request #42758 from krousey/downgrades
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Implement automated downgrade testing.

Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
2017-03-09 15:06:56 -08:00
Kubernetes Submit Queue
7c08e817a5 Merge pull request #42734 from dashpole/deletion_timeout
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Create DefaultPodDeletionTimeout for e2e tests

In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (#41644, #41456) have resulted in some timeouts in e2e tests.  #42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.

cc @vishh @Random-Liu
2017-03-09 15:06:53 -08:00
Kubernetes Submit Queue
a22fac00dd Merge pull request #42833 from caesarxuchao/pod-deletion
Automatic merge from submit-queue

Don't wait for the final deletion of pod

The final deletion of the pod depends on kubelet and other components operating correctly. The purpose of this e2e test is verifying the clientset can handle deleteOptions correctly, so waiting for the deletionTimestamp and deletionGraceperiod get set is good enough.

In the long run, we should move this set of e2e tests to integration tests.

Fix #42724 #42646

cc @marun
2017-03-09 13:21:53 -08:00
Kris
cc84e0895a Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
2017-03-09 12:45:20 -08:00
Chao Xu
130437b94e wait for the deletionTimestamp set instead of waiting for the final deletion 2017-03-09 11:35:51 -08:00
Kubernetes Submit Queue
7b4bec038c Merge pull request #42805 from deads2k/client-01-flake-debug
Automatic merge from submit-queue

add debugging to the client watch test

Adds debugging information for https://github.com/kubernetes/kubernetes/issues/42724.  I suspect that the watch is closing early, but I'd like proof before I consider things like retrying the list and doing another watch to observe the delete.  I'm not even sure that would satisfy the test

It seems like a flaky way to build the test.  Why wouldn't we delete non-gracefully?

@kubernetes/sig-api-machinery-misc @caesarxuchao 
@wojtek-t saw you just hit this if you wanted to take a quick look at the debugging I added.
2017-03-09 08:20:45 -08:00
deads2k
ceb3e27fff add debugging to the client watch test 2017-03-09 09:27:41 -05:00
Kubernetes Submit Queue
cf732613e3 Merge pull request #42278 from marun/fed-api-fixture
Automatic merge from submit-queue (batch tested with PRs 42728, 42278)

[Federation] Create integration test fixture for api

This PR factors a reusable fixture for the federation api server out of the existing integration test.

Targets #40705

cc: @kubernetes/sig-federation-pr-reviews
2017-03-09 05:45:32 -08:00