Copy edits for spelling errors and typos

Signed-off-by: Ed Costello <epc@epcostello.com>
This commit is contained in:
Ed Costello
2015-06-11 01:11:44 -04:00
parent 3ce7fe8310
commit 05714d416b
23 changed files with 31 additions and 31 deletions

View File

@@ -4,9 +4,9 @@ This document serves as a proposal for high availability of the scheduler and co
## Design Options
For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en)
1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time.
1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time.
2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the apprach that this proposal will leverage.
2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the approach that this proposal will leverage.
3. Active-Active (Load Balanced): Clients can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver.
@@ -16,7 +16,7 @@ Implementation References:
* [etcd](https://groups.google.com/forum/#!topic/etcd-dev/EbAa4fjypb4)
* [initialPOC](https://github.com/rrati/etcd-ha)
In HA, the apiserver will provide an api for sets of replicated clients to do master election: acquire the lease, renew the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attemping to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. Only the current master can renew the lease. When renewing the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD.
In HA, the apiserver will provide an api for sets of replicated clients to do master election: acquire the lease, renew the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attempting to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propagate to the client. Successfully creating the key means the client making the request is the master. Only the current master can renew the lease. When renewing the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD.
The first component to request leadership will become the master. All other components of that type will fail until the current leader releases the lease, or fails to renew the lease within the expiration time. On startup, all components should attempt to become master. The component that succeeds becomes the master, and should perform all functions of that component. The components that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. A clean shutdown of the leader will cause a release of the lease and a new master will be elected.