Merge pull request #52176 from liggitt/heartbeat-timeout

Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Eliminate hangs/throttling of node heartbeat

Fixes https://github.com/kubernetes/kubernetes/issues/48638
Fixes #50304

Stops kubelet from wedging when updating node status if unable to establish tcp connection.

 Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out,  so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them

```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
This commit is contained in:
Kubernetes Submit Queue
2017-09-16 09:45:29 -07:00
committed by GitHub
10 changed files with 102 additions and 16 deletions

View File

@@ -183,7 +183,7 @@ func (cnc *CloudNodeController) updateNodeAddress(node *v1.Node, instances cloud
if !nodeAddressesChangeDetected(node.Status.Addresses, newNode.Status.Addresses) {
return
}
_, err = nodeutil.PatchNodeStatus(cnc.kubeClient, types.NodeName(node.Name), node, newNode)
_, err = nodeutil.PatchNodeStatus(cnc.kubeClient.CoreV1(), types.NodeName(node.Name), node, newNode)
if err != nil {
glog.Errorf("Error patching node with cloud ip addresses = [%v]", err)
}