kubernetes/pkg/kubelet
Kubernetes Submit Queue a3f40dd8df
Merge pull request #60856 from jiayingz/race-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes the races around devicemanager Allocate() and endpoint deletion.

There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.

Tested:
Ran GPUDevicePlugin e2e_node test 100 times and all passed now.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/60176

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixes the races around devicemanager Allocate() and endpoint deletion.
```
2018-03-12 02:50:13 -07:00
..
apis API Changes for RunAsGroup and Implementation and e2e 2018-02-28 22:09:56 -08:00
cadvisor Merge pull request #59743 from feiskyer/stats 2018-02-23 20:09:32 -08:00
certificate Merge pull request #59316 from smarterclayton/terminate_early 2018-02-21 15:40:41 -08:00
checkpoint Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
client Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
cm Merge pull request #60856 from jiayingz/race-fix 2018-03-12 02:50:13 -07:00
config Merge pull request #59849 from yue9944882/forcibly-lower-staticpod-name 2018-02-25 18:29:51 -08:00
configmap Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
container Merge pull request #59842 from ixdy/update-rules_go-02-2018 2018-02-19 22:23:05 -08:00
custommetrics Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
dockershim API Changes for RunAsGroup and Implementation and e2e 2018-02-28 22:09:56 -08:00
envvars Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
events Improve messaging on resize 2018-01-29 15:07:51 -05:00
eviction fix running with no eviction thresholds 2018-02-20 13:49:14 -08:00
gpu Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
images Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
kubeletconfig expunge the word 'manifest' from Kubelet's config API 2018-02-23 11:44:06 -08:00
kuberuntime Merge pull request #59333 from feiskyer/win 2018-02-27 20:34:13 -08:00
leaky update BUILD files 2017-10-15 18:18:13 -07:00
lifecycle Support cluster-level extended resources in kubelet and kube-scheduler 2018-02-27 17:25:30 -08:00
logs Generated code 2018-02-23 01:42:35 +00:00
metrics Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
mountpod Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
network Auto-updated BUILD files 2018-02-27 11:18:11 -08:00
pleg Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
pod Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
preemption Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
prober Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
qos Make a few code paths compile cleanly with 32-bit Go. 2018-02-27 13:53:32 -08:00
remote Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
rkt Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
secret Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
server Merge pull request #59842 from ixdy/update-rules_go-02-2018 2018-02-19 22:23:05 -08:00
stats Add CPU/Memory pod stats for CRI stats. 2018-02-26 19:29:47 +00:00
status Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
sysctl clean up sysctl code 2018-02-23 16:41:53 +08:00
types Merge pull request #58835 from ravisantoshgudimetla/critical-pod-with-priority 2018-02-23 11:22:31 -08:00
util Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
volumemanager Merge pull request #60505 from jsafrane/fix-logf 2018-02-28 06:07:24 -08:00
winstats fix "make test" 2018-02-24 17:39:21 +08:00
active_deadline_test.go run hack/update-all 2017-06-22 11:31:03 -07:00
active_deadline.go run hack/update-all 2017-06-22 11:31:03 -07:00
BUILD update bazel 2018-02-27 20:23:36 +08:00
doc.go
kubelet_getters_test.go kubelet: remove code for handling old pod/containers paths. 2017-07-20 13:10:15 +02:00
kubelet_getters.go Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
kubelet_network_test.go Move DNS related kubelet codes into its own package 2017-11-15 10:56:44 -08:00
kubelet_network.go fix all the typos across the project 2018-02-11 11:04:14 +08:00
kubelet_node_status_test.go Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
kubelet_node_status.go Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
kubelet_pods_test.go Increment CRI version from v1alpha1 to v1alpha2 2018-02-07 09:06:26 +01:00
kubelet_pods_windows_test.go run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
kubelet_pods.go update import 2018-02-27 20:23:35 +08:00
kubelet_resources_test.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
kubelet_resources.go apimachinery: remove Scheme.DeepCopy 2017-10-06 14:59:17 +02:00
kubelet_test.go Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
kubelet_volumes_test.go update import 2018-02-27 20:23:35 +08:00
kubelet_volumes.go use GetFileType per mount.Interface to check hostpath type 2017-09-26 09:57:06 +08:00
kubelet.go Promote LocalStorageCapacityIsolation feature to beta 2018-03-02 15:10:08 -08:00
oom_watcher_test.go run root-rewrite-import-client-go-api-types 2017-06-22 11:30:59 -07:00
oom_watcher.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
OWNERS Name change: s/timstclair/tallclair/ 2017-07-10 14:05:46 -07:00
pod_container_deletor_test.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
pod_container_deletor.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
pod_workers_test.go Correct TestUpdatePod comment 2017-10-20 09:41:18 +08:00
pod_workers.go kubelet syncPod throws specific events 2017-10-13 10:24:09 -04:00
reason_cache_test.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
reason_cache.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
runonce_test.go Share /var/lib/kubernetes on startup 2017-08-30 16:45:04 +02:00
runonce.go Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
runtime.go Remove setInitError. 2018-01-29 21:44:54 -08:00
util.go Fix comments and typo in the error message. 2017-07-14 19:17:12 +02:00
volume_host.go update import 2018-02-27 20:23:35 +08:00