Files
kubernetes/pkg/cloudprovider
Kubernetes Submit Queue 00ee67bdc8 Merge pull request #52575 from vmware/vSphereInstanceNotFoundOnPowerOff
Automatic merge from submit-queue (batch tested with PRs 51311, 52575, 53169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Unable to detach the vSphere volume from Powered off node

With the existing implementation when a vSphere node is powered off, the node is not deleted by the node controller and is in "NotReady" state. Following the approach similar to GCE as mentioned here - https://github.com/kubernetes/kubernetes/issues/46442.

I observe the following issues:
- The pods on the powered off node are not **instantaneously** created on the other available node. Only after 5 minutes timeout, the pods will be created on other available nodes with the volume attached to it. This means an application downtime of around 5 minutes which is not good at all.
- The volume on the powered off node are not detached at all when the pod with the volume is already moved to other available node. Hence any attempt to restart the powered off node will fail as the same volume is attached to other node which is present on this powered off node. (Please note that the volumes are not automatically detached from powered off in vSphere as opposed to GCE, AWS where volume is automatically detached from when node is powered off).

So inorder to resolve this problem, we have decided to back with the approach where the powered off node will be removed by the Node controller. So the above 2 problems will be resolved as follows:
- Since the node is deleted, the pod on the powered off node becomes instantaneously available on other available nodes with the volume attached to the new nodes. Hence there is no application downtime at all.
- After a period of 6 minutes (timeout period), the volumes are automatically detached from the powered off node. Hence any restarts after 6 minutes on the powered off node would work and not cause any problems as volumes are already detached.

For now, we would want to go ahead with deleting the node from node controller when a node is powered off in vCenter until we have a better approach. I think the best possible solution would be to introduce power handler in volume controller to see if the node is powered off before we can take any appropriate for attach/detach operations.

```release-note
None
```

@jingxu97 @saad-ali @divyenpatel @luomiao @rohitjogvmw
2017-09-28 23:18:19 -07:00
..

Deprecation Notice: This directory has entered maintenance mode and will not be accepting new providers. Cloud Providers in this directory will continue to be actively developed or maintained and supported at their current level of support as a longer-term solution evolves.

Overview:

The mechanism for supporting cloud providers is currently in transition: the original method of implementing cloud provider-specific functionality within the main kubernetes tree (here) is no longer advised; however, the proposed solution is still in development.

Guidance for potential cloud providers:

  • Support for cloud providers is currently in a state of flux. Background information on motivation and the proposal for improving is in the github proposal.
  • In support of this plan, a new cloud-controller-manager binary was added in 1.6. This was the first of several steps (see the proposal for more information).
  • Attempts to contribute new cloud providers or (to a lesser extent) persistent volumes to the core repo will likely meet with some pushback from reviewers/approvers.
  • It is understood that this is an unfortunate situation in which 'the old way is no longer supported but the new way is not ready yet', but the initial path is unsustainable, and contributors are encouraged to participate in the implementation of the proposed long-term solution, as there is risk that PRs for new cloud providers here will not be approved.
  • Though the fully productized support envisioned in the proposal is still 2 - 3 releases out, the foundational work is underway, and a motivated cloud provider could accomplish the work in a forward-looking way. Contributors are encouraged to assist with the implementation of the design outlined in the proposal.

Some additional context on status / direction:

  • 1.6 added a new cloud-controller-manager binary that may be used for testing the new out-of-core cloudprovider flow.
  • Setting cloud-provider=external allows for creation of a separate controller-manager binary
  • 1.7 adds extensible admission control, further enabling topology customization.