Commit Graph

12002 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
dbea6f6372 Merge pull request #61085 from hzxuzhonghu/unversioned-cleanup
Automatic merge from submit-queue (batch tested with PRs 60919, 60953, 61085, 61083, 60971). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove unused `pkg/api/unversioned`

**What this PR does / why we need it**:

clean code, see #61084 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61084

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 20:34:31 -07:00
Lantao Liu
9fc2795d55 Change pods memory boundary.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-03-20 23:24:16 +00:00
Kubernetes Submit Queue
14e3efe26a Merge pull request #58717 from resouer/extender-interface
Automatic merge from submit-queue (batch tested with PRs 60759, 60531, 60923, 60851, 58717). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Implement preemption for extender with a verb and new interface

**What this PR does / why we need it**:

This is an alternative way of implementing #51656

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #51656

**Special notes for your reviewer**:

We will also want to compare with #56296 to see which one is the best solution. See: https://github.com/kubernetes/kubernetes/pull/56296#discussion_r163381235

cc @ravigadde @bsalamat 

**Release note**:

```release-note
Implement preemption for extender with a verb and new interface
```
2018-03-20 15:34:41 -07:00
djkonro
23588c7c69 Grammar and spelling update 2018-03-20 22:03:33 +01:00
Kubernetes Submit Queue
c5d4a032d7 Merge pull request #60547 from brahmaroutu/conf_kubectl
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding details to Conformance Tests using RFC 2119 standards.

This PR is part of the conformance documentation. This is to provide more formal specification using RFC 2119 keywords to describe the test so that who ever is running conformance tests do not have to go through the code to understand why and what is tested.
The documentation information added here into each of the tests eventually result into a document which is currently checked in at location https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.9.md

I would like to have this PR reviewed for v1.10 as I consider it important to strengthen the conformance documents.
2018-03-20 11:35:23 -07:00
Anthony Yeh
c4c6e4bbb8 hack/update-bazel.sh 2018-03-20 11:15:36 -07:00
Anthony Yeh
f3ba27a500 ReplicaSet: Use apps/v1 RS in integration test. 2018-03-20 11:15:36 -07:00
Anthony Yeh
b27836032a ReplicaSet: Use apps/v1 RS in e2e test. 2018-03-20 09:10:40 -07:00
Kubernetes Submit Queue
7ab554ce43 Merge pull request #60666 from immutableT/kms_mock_flake_issue
Automatic merge from submit-queue (batch tested with PRs 60574, 60666, 60831, 60877, 60357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove potential sources of flakes for kms_transformation_test.go.

**What this PR does / why we need it**:
Remove potential sources for flakes in TestKMSPlugin test.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
#60614
**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 08:34:35 -07:00
Karol Wychowaniec
71f14cf335 Add e2e test for Custom Metrics API with new Stackdriver resource model
and External Metrics API.
2018-03-20 15:51:58 +01:00
Kubernetes Submit Queue
2cb4297c96 Merge pull request #59637 from hanxiaoshuai/bugfix0209
Automatic merge from submit-queue (batch tested with PRs 59637, 60611, 60788, 60489, 60687). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix todo: use a better way to keep this label unique in the test

**What this PR does / why we need it**:
fix todo: use a better way to keep this label unique in the test in test/e2e/apimachinery/garbage_collector.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-20 04:34:31 -07:00
Kubernetes Submit Queue
c0db49c2cb Merge pull request #60331 from jennybuckley/watch-e2e-test
Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add e2e test for watch

**What this PR does / why we need it**:
Currently watch is only tested by kubectl tests, but all clients of kubernetes should be able to reliably watch resources for changes. This test should be able to accommodate testing watch for custom resources without many changes.

/sig api-machinery

```release-note
Added e2e test for watch
```
2018-03-19 23:42:11 -07:00
Kubernetes Submit Queue
8c3b5541e5 Merge pull request #60457 from sjenning/fix-websocket-e2e-test
Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

tests: e2e: empty msg from channel other than stdout should be non-fatal

Currently, if the exec websocket encounters a message that is not in the stdout stream, it immediately fails.  However it also currently requests the stderr steam in the query params.  There doesn't seem to be any guarantee that we don't get an empty message on the stderr stream.

Requesting the stderr stream in the query is desirable if, for some reason, something in the container fails and writes to stderr.

However, we do not need fail the test if we get an empty message on another stream.  If the message is not empty, then that _does_ indicate and error and we should fail.

This is the situation we are currently observing with docker 1.13 in the origin CI https://github.com/openshift/origin/issues/18726

@derekwaynecarr @smarterclayton @gabemontero @liggitt @deads2k 

/sig node
2018-03-19 23:42:07 -07:00
Kubernetes Submit Queue
9df57d2812 Merge pull request #60086 from dougm/vsphere-finder
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

vSphere: Minimize property collection via Finder

The 'All' parameter of the 'NewFinder' function controls property collection while searching the inventory.
When 'All' is set to 'false', Finder collects the minimal set of object properties required to search inventory.
When 'All' is set to 'true', Finder collects *all* object properties, which are *not* required to search inventory.
Setting 'All' to 'true' is only useful when inspecting all properties of an object,
such as by certain govc commands when the '-json' or '-dump' flags are specified.

Changing All=false in VCP minimizes the SOAP payload size and marshalling required on both sides, without impacting any functionality.



**What this PR does / why we need it**:

Changing All=false in VCP minimizes the SOAP payload size and marshalling required on both sides, without impacting any functionality.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-03-19 21:34:36 -07:00
Kubernetes Submit Queue
c64f19dd1b Merge pull request #59728 from wgliang/master.append
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

more concise to merge the slice

**What this PR does / why we need it**:
more concise to merge the slice

**Special notes for your reviewer**:
2018-03-19 21:34:30 -07:00
Kubernetes Submit Queue
37b2edd855 Merge pull request #54300 from jsafrane/fix-test-images
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix test images

These commits fix volume_io tests for iSCSI and Ceph RBD. Both server images were quite old (Fedora 22), so I updated them to ~~something more stable (CentOS 7) and to newer Ceph (Jewel, 10.2.7).~~ something newer (Fedora 26).

The most important fix is that the test volumes have 120 MB so volume_io test can actually run - the tests put 100MB file to the volume to check its persistence.

When mount containers in #53440 are merged I'll try to run the tests regularly with every PR (or merge) so we catch regressions quickly.

```release-note
NONE
```

/sig testing
/sig storage

/assign @jeffvance 

Fixes: #56725
2018-03-19 20:34:22 -07:00
Shyam Jeedigunta
e5dc6c88eb Wait for only enough no. of RC replicas to be running in testutil 2018-03-19 14:22:18 +01:00
Cao Shufeng
440aab5b86 add e2e test 2018-03-19 08:57:39 +08:00
Kubernetes Submit Queue
ebae09e741 Merge pull request #61234 from nikhiljindal/kubemciTest
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fail the ingress test if it timesout getting address

Updating the test to fail if it timesout getting IP address for the ingress rather than silently ignoring that error.
Also improved some logging to print more information.

This is to help in debugging tests added in https://github.com/kubernetes/kubernetes/pull/59234

cc @madhusudancs @MrHohn @nicksardo 

Ref https://github.com/GoogleCloudPlatform/k8s-multicluster-ingress/issues/131

```release-note
NONE
```
2018-03-18 15:33:43 -07:00
Kubernetes Submit Queue
f125152212 Merge pull request #61284 from jsafrane/fix-fsgroup-subpath
Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix creation of subpath with SUID/SGID directories.

SafeMakeDir() should apply SUID/SGID/sticky bits to the directory it creates.

Fixes #61283 

**Release note**:

```release-note
NONE
```
2018-03-16 16:55:57 -07:00
Hemant Kumar
0600f7ee22 Fix e2e tests for emptydir 2018-03-16 15:14:42 -04:00
Kubernetes Submit Queue
ca02c11887 Merge pull request #61161 from k82cn/k8s_59194_4
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Added unschedulable taint

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #59194; fixes #61050

**Release note**:

```release-note
When `TaintNodesByCondition` enabled, added `node.kubernetes.io/unschedulable:NoSchedule`
 taint to the node if `spec.Unschedulable` is true.

When `ScheduleDaemonSetPods` enabled, `node.kubernetes.io/unschedulable:NoSchedule` 
toleration is added automatically to DaemonSet Pods; so the `unschedulable` field of 
a node is not respected by the DaemonSet controller.
```
2018-03-16 11:22:05 -07:00
Hemant Kumar
b85c4fc57a Refactor disruptive tests to use more volume types 2018-03-16 11:49:22 -04:00
zhengjiajin
69b079af76 remove DNS service from kubectl comformance test 2018-03-16 10:01:35 +08:00
Da K. Ma
b23db30765 Added unscheduable taint.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-03-16 09:13:08 +08:00
Cheng Xing
fe76c9f779 Fixes 'Zone is empty' errors in PD upgrade tests; skips pd tests with inline volume in multizone clusters 2018-03-15 15:00:13 -07:00
nikhiljindal
cdfbb54db2 Fail the ingress test if it timesout getting address for IP address 2018-03-15 14:46:17 -07:00
immutablet
04a6613fb5 Instrument transformer.go with latency metrics. 2018-03-15 14:13:24 -07:00
Jun Xiang Tee
92070eba3d add rolling update daemonset existing pod adoption integration test 2018-03-14 14:00:38 -07:00
jennybuckley
5f518e6d4c Fix error handling in gc e2e test 2018-03-14 10:27:44 -07:00
Kubernetes Submit Queue
05ec0a77b4 Merge pull request #61118 from shyamjvs/bump-apiserver-mem-threshold
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Increase apiserver mem-threshold in density test

Ref: https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-372682659 (fixes part of that issue)

/sig scalability
/kind bug
/priority important-soon
/cc @wojtek-t
/cc @crassirostris (for the release-note)

```release-note
Audit logging with buffering enabled can increase apiserver memory usage (e.g. up to 200MB in 100-node cluster). The increase is bounded by the buffer size (configurable). Ref: issue #60500
```
2018-03-14 09:49:48 -07:00
muhongwei
e153a0d9cb Correct spelling 2018-03-14 18:03:42 +08:00
Kubernetes Submit Queue
32343b7f3d Merge pull request #61111 from jsafrane/fix-subpath-multizone
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix subpath e2e tests on multizone cluster.

Use dynamically provisioned PV to run GCE PD tests. This will make sure that the pod is scheduled to the right zone and GCE PD can be attached to a node.

**Which issue(s) this PR fixes**:
Fixes #61101 


**Release note**:

```release-note
NONE
```
/sig storage
@msau42 @verult
2018-03-13 14:06:47 -07:00
Kubernetes Submit Queue
ae990bb5a9 Merge pull request #60968 from loburm/fix_gke_logging_test
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix broken gke regional logging test.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60882

```release-note
NONE
```
2018-03-13 12:27:04 -07:00
Kubernetes Submit Queue
8313fc0dac Merge pull request #61080 from liggitt/subpath-test
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Detect backsteps correctly in base path detection

Avoids false positives with atomic writer `..<timestamp>` directories

Fixes #61076

/assign @msau42 @jsafrane

```release-note
Fix a regression that prevented using `subPath` volume mounts with secret, configMap, projected, and downwardAPI volumes
```
2018-03-13 12:27:00 -07:00
Kubernetes Submit Queue
b651ed5ea7 Merge pull request #60998 from jpbetz/etcd-3.1.12
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump to etcd 3.1.12 to pick up critical fix

etcd [3.1.12](https://github.com/coreos/etcd/releases/tag/v3.1.12) (as well as 3.2.17 and 3.3.2) was released yesterday to fix a bug critical to kubernetes:

Fix [mvcc "unsynced" watcher restore operation](https://github.com/coreos/etcd/pull/9297).
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).

This will be backported to 1.9 as well.

Release note:
```release-note
Upgrade the default etcd server version to 3.1.12 to pick up critical etcd "mvcc "unsynced" watcher restore operation" fix.
```

cc @gyuho @wojtek-t @shyamjvs @timothysc @jdumars
2018-03-13 09:11:10 -07:00
Shyam JVS
b43b621690 Increase apiserver mem-threshold in density test 2018-03-13 16:47:14 +01:00
Jan Safranek
c44e135442 Fix subpath e2e tests on multizone cluster.
Use dynamically provisioned PV to run GCE PD tests. This will make sure
that the pod is scheduled to the right zone and GCE PD can be attached
to a node.
2018-03-13 14:26:37 +01:00
Jordan Liggitt
806f6772c6 Add atomic writer subpath e2e tests 2018-03-13 08:53:50 -04:00
xuzhonghu
70d5af6e7b stop using AlwaysAdmit admission 2018-03-13 20:02:56 +08:00
hzxuzhonghu
f12647e16d pkg/api/unversioned related cleanup 2018-03-13 17:20:16 +08:00
Kubernetes Submit Queue
3d1331f297 Merge pull request #61044 from liggitt/subpath-master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

subpath fixes

fixes #60813 for master / 1.10

```release-note
Fixes CVE-2017-1002101 - See https://issue.k8s.io/60813 for details
```
2018-03-12 11:51:59 -07:00
jennybuckley
3b2472a305 Add e2e test for watch 2018-03-12 10:48:43 -07:00
Kubernetes Submit Queue
a3f40dd8df Merge pull request #60856 from jiayingz/race-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes the races around devicemanager Allocate() and endpoint deletion.

There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.

Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
2018-03-12 02:50:13 -07:00
Kubernetes Submit Queue
36058cb0c3 Merge pull request #60997 from MrHohn/e2e-fix-cleanup-svc-regional
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[e2e service] Fix CleanupGCEResources for regional test

*What this PR does / why we need it**:
From https://k8s-testgrid.appspot.com/google-gke-staging#gke-staging-1-8-1-9-upgrade-regional-cluster&width=20, regional cluster test is failing because the GCE resource cleanup function attempts to parse region from `--gcp-zone` while regional cluster only set `--gcp-region`.

This PR pipes region into the cleanup function as well. This will need to be cherrypicked to 1.8 and 1.9.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #NONE 

**Special notes for your reviewer**:
/assign @bowei @wojtek-t 
cc @nikhiljindal to see if there is anything should be fixed for federation.

**Release note**:

```release-note
NONE
```
2018-03-12 00:03:38 -07:00
Ian Chakeres
06462e91c8 Added e2e test for local volume provisioner discovery of new mountpoints while running. 2018-03-11 16:52:33 -07:00
Kubernetes Submit Queue
9ad5ea2d61 Merge pull request #60993 from MrHohn/e2e-restart-apiserver-refine-followup
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[e2e service] Move apiserver restart validation logic into util

**What this PR does / why we need it**:
Follow up of #60906, on GKE apiserver pod is invisible on k8s, hence test is failing.

This PR bakes the restart validation logic into the util function instead so it could be env-awared.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60761

**Special notes for your reviewer**:
Sorry for the noise.
/assign @rramkumar1 @bowei 
cc @krzyzacy 

**Release note**:

```release-note
NONE
```
2018-03-09 18:33:05 -08:00
Jiaying Zhang
5514a1f4dd Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
2018-03-09 17:00:57 -08:00
Kubernetes Submit Queue
36fd62eed8 Merge pull request #60972 from wojtek-t/fix_upgrade_test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix upgrade tests for GKE Regional Clusters
2018-03-09 15:44:46 -08:00
Joe Betz
e2a25f9b54 Bump to etcd 3.1.12 to pick up critical fix 2018-03-09 14:28:23 -08:00