Deprecate the term "Ubernetes" in favor of "Cluster Federation" and "Multi-AZ Clusters"
This commit is contained in:
@@ -34,25 +34,25 @@ Documentation for other releases can be found at
|
||||
|
||||
# Kubernetes Multi-AZ Clusters
|
||||
|
||||
## (a.k.a. "Ubernetes-Lite")
|
||||
## (previously nicknamed "Ubernetes-Lite")
|
||||
|
||||
## Introduction
|
||||
|
||||
Full Ubernetes will offer sophisticated federation between multiple kuberentes
|
||||
Full Cluster Federation will offer sophisticated federation between multiple kuberentes
|
||||
clusters, offering true high-availability, multiple provider support &
|
||||
cloud-bursting, multiple region support etc. However, many users have
|
||||
expressed a desire for a "reasonably" high-available cluster, that runs in
|
||||
multiple zones on GCE or availability zones in AWS, and can tolerate the failure
|
||||
of a single zone without the complexity of running multiple clusters.
|
||||
|
||||
Ubernetes-Lite aims to deliver exactly that functionality: to run a single
|
||||
Multi-AZ Clusters aim to deliver exactly that functionality: to run a single
|
||||
Kubernetes cluster in multiple zones. It will attempt to make reasonable
|
||||
scheduling decisions, in particular so that a replication controller's pods are
|
||||
spread across zones, and it will try to be aware of constraints - for example
|
||||
that a volume cannot be mounted on a node in a different zone.
|
||||
|
||||
Ubernetes-Lite is deliberately limited in scope; for many advanced functions
|
||||
the answer will be "use Ubernetes (full)". For example, multiple-region
|
||||
Multi-AZ Clusters are deliberately limited in scope; for many advanced functions
|
||||
the answer will be "use full Cluster Federation". For example, multiple-region
|
||||
support is not in scope. Routing affinity (e.g. so that a webserver will
|
||||
prefer to talk to a backend service in the same zone) is similarly not in
|
||||
scope.
|
||||
@@ -122,7 +122,7 @@ zones (in the same region). For both clouds, the behaviour of the native cloud
|
||||
load-balancer is reasonable in the face of failures (indeed, this is why clouds
|
||||
provide load-balancing as a primitve).
|
||||
|
||||
For Ubernetes-Lite we will therefore simply rely on the native cloud provider
|
||||
For multi-AZ clusters we will therefore simply rely on the native cloud provider
|
||||
load balancer behaviour, and we do not anticipate substantial code changes.
|
||||
|
||||
One notable shortcoming here is that load-balanced traffic still goes through
|
||||
@@ -130,8 +130,8 @@ kube-proxy controlled routing, and kube-proxy does not (currently) favor
|
||||
targeting a pod running on the same instance or even the same zone. This will
|
||||
likely produce a lot of unnecessary cross-zone traffic (which is likely slower
|
||||
and more expensive). This might be sufficiently low-hanging fruit that we
|
||||
choose to address it in kube-proxy / Ubernetes-Lite, but this can be addressed
|
||||
after the initial Ubernetes-Lite implementation.
|
||||
choose to address it in kube-proxy / multi-AZ clusters, but this can be addressed
|
||||
after the initial implementation.
|
||||
|
||||
|
||||
## Implementation
|
||||
@@ -182,8 +182,8 @@ region-wide, meaning that a single call will find instances and volumes in all
|
||||
zones. In addition, instance ids and volume ids are unique per-region (and
|
||||
hence also per-zone). I believe they are actually globally unique, but I do
|
||||
not know if this is guaranteed; in any case we only need global uniqueness if
|
||||
we are to span regions, which will not be supported by Ubernetes-Lite (to do
|
||||
that correctly requires an Ubernetes-Full type approach).
|
||||
we are to span regions, which will not be supported by multi-AZ clusters (to do
|
||||
that correctly requires a full Cluster Federation type approach).
|
||||
|
||||
## GCE Specific Considerations
|
||||
|
||||
@@ -197,20 +197,20 @@ combine results from calls in all relevant zones.
|
||||
A further complexity is that GCE volume names are scoped per-zone, not
|
||||
per-region. Thus it is permitted to have two volumes both named `myvolume` in
|
||||
two different GCE zones. (Instance names are currently unique per-region, and
|
||||
thus are not a problem for Ubernetes-Lite).
|
||||
thus are not a problem for multi-AZ clusters).
|
||||
|
||||
The volume scoping leads to a (small) behavioural change for Ubernetes-Lite on
|
||||
The volume scoping leads to a (small) behavioural change for multi-AZ clusters on
|
||||
GCE. If you had two volumes both named `myvolume` in two different GCE zones,
|
||||
this would not be ambiguous when Kubernetes is operating only in a single zone.
|
||||
But, if Ubernetes-Lite is operating in multiple zones, `myvolume` is no longer
|
||||
But, when operating a cluster across multiple zones, `myvolume` is no longer
|
||||
sufficient to specify a volume uniquely. Worse, the fact that a volume happens
|
||||
to be unambigious at a particular time is no guarantee that it will continue to
|
||||
be unambigious in future, because a volume with the same name could
|
||||
subsequently be created in a second zone. While perhaps unlikely in practice,
|
||||
we cannot automatically enable Ubernetes-Lite for GCE users if this then causes
|
||||
we cannot automatically enable multi-AZ clusters for GCE users if this then causes
|
||||
volume mounts to stop working.
|
||||
|
||||
This suggests that (at least on GCE), Ubernetes-Lite must be optional (i.e.
|
||||
This suggests that (at least on GCE), multi-AZ clusters must be optional (i.e.
|
||||
there must be a feature-flag). It may be that we can make this feature
|
||||
semi-automatic in future, by detecting whether nodes are running in multiple
|
||||
zones, but it seems likely that kube-up could instead simply set this flag.
|
||||
@@ -218,14 +218,14 @@ zones, but it seems likely that kube-up could instead simply set this flag.
|
||||
For the initial implementation, creating volumes with identical names will
|
||||
yield undefined results. Later, we may add some way to specify the zone for a
|
||||
volume (and possibly require that volumes have their zone specified when
|
||||
running with Ubernetes-Lite). We could add a new `zone` field to the
|
||||
running in multi-AZ cluster mode). We could add a new `zone` field to the
|
||||
PersistentVolume type for GCE PD volumes, or we could use a DNS-style dotted
|
||||
name for the volume name (<name>.<zone>)
|
||||
|
||||
Initially therefore, the GCE changes will be to:
|
||||
|
||||
1. change kube-up to support creation of a cluster in multiple zones
|
||||
1. pass a flag enabling Ubernetes-Lite with kube-up
|
||||
1. pass a flag enabling multi-AZ clusters with kube-up
|
||||
1. change the kuberentes cloud provider to iterate through relevant zones when resolving items
|
||||
1. tag GCE PD volumes with the appropriate zone information
|
||||
|
||||
|
@@ -34,7 +34,7 @@ Documentation for other releases can be found at
|
||||
|
||||
# Kubernetes Cluster Federation
|
||||
|
||||
## (a.k.a. "Ubernetes")
|
||||
## (previously nicknamed "Ubernetes")
|
||||
|
||||
## Requirements Analysis and Product Proposal
|
||||
|
||||
@@ -413,7 +413,7 @@ detail to be added here, but feel free to shoot down the basic DNS
|
||||
idea in the mean time. In addition, some applications rely on private
|
||||
networking between clusters for security (e.g. AWS VPC or more
|
||||
generally VPN). It should not be necessary to forsake this in
|
||||
order to use Ubernetes, for example by being forced to use public
|
||||
order to use Cluster Federation, for example by being forced to use public
|
||||
connectivity between clusters.
|
||||
|
||||
## Cross-cluster Scheduling
|
||||
@@ -546,7 +546,7 @@ prefers the Decoupled Hierarchical model for the reasons stated below).
|
||||
here, as each underlying Kubernetes cluster can be scaled
|
||||
completely independently w.r.t. scheduling, node state management,
|
||||
monitoring, network connectivity etc. It is even potentially
|
||||
feasible to stack "Ubernetes" federated clusters (i.e. create
|
||||
feasible to stack federations of clusters (i.e. create
|
||||
federations of federations) should scalability of the independent
|
||||
Federation Control Plane become an issue (although the author does
|
||||
not envision this being a problem worth solving in the short
|
||||
@@ -595,7 +595,7 @@ prefers the Decoupled Hierarchical model for the reasons stated below).
|
||||
|
||||

|
||||
|
||||
## Ubernetes API
|
||||
## Cluster Federation API
|
||||
|
||||
It is proposed that this look a lot like the existing Kubernetes API
|
||||
but be explicitly multi-cluster.
|
||||
@@ -603,7 +603,8 @@ but be explicitly multi-cluster.
|
||||
+ Clusters become first class objects, which can be registered,
|
||||
listed, described, deregistered etc via the API.
|
||||
+ Compute resources can be explicitly requested in specific clusters,
|
||||
or automatically scheduled to the "best" cluster by Ubernetes (by a
|
||||
or automatically scheduled to the "best" cluster by the Cluster
|
||||
Federation control system (by a
|
||||
pluggable Policy Engine).
|
||||
+ There is a federated equivalent of a replication controller type (or
|
||||
perhaps a [deployment](deployment.md)),
|
||||
@@ -627,14 +628,15 @@ Controllers and related Services accordingly).
|
||||
This should ideally be delegated to some external auth system, shared
|
||||
by the underlying clusters, to avoid duplication and inconsistency.
|
||||
Either that, or we end up with multilevel auth. Local readonly
|
||||
eventually consistent auth slaves in each cluster and in Ubernetes
|
||||
eventually consistent auth slaves in each cluster and in the Cluster
|
||||
Federation control system
|
||||
could potentially cache auth, to mitigate an SPOF auth system.
|
||||
|
||||
## Data consistency, failure and availability characteristics
|
||||
|
||||
The services comprising the Ubernetes Control Plane) have to run
|
||||
The services comprising the Cluster Federation control plane) have to run
|
||||
somewhere. Several options exist here:
|
||||
* For high availability Ubernetes deployments, these
|
||||
* For high availability Cluster Federation deployments, these
|
||||
services may run in either:
|
||||
* a dedicated Kubernetes cluster, not co-located in the same
|
||||
availability zone with any of the federated clusters (for fault
|
||||
@@ -672,7 +674,7 @@ does the zookeeper config look like for N=3 across 3 AZs -- and how
|
||||
does each replica find the other replicas and how do clients find
|
||||
their primary zookeeper replica? And now how do I do a shared, highly
|
||||
available redis database? Use a few common specific use cases like
|
||||
this to flesh out the detailed API and semantics of Ubernetes.
|
||||
this to flesh out the detailed API and semantics of Cluster Federation.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
@@ -79,10 +79,11 @@ The design of the pipeline for collecting application level metrics should
|
||||
be revisited and it's not clear whether application level metrics should be
|
||||
available in API server so the use case initially won't be supported.
|
||||
|
||||
#### Ubernetes
|
||||
#### Cluster Federation
|
||||
|
||||
Ubernetes might want to consider cluster-level usage (in addition to cluster-level request)
|
||||
of running pods when choosing where to schedule new pods. Although Ubernetes is still in design,
|
||||
The Cluster Federation control system might want to consider cluster-level usage (in addition to cluster-level request)
|
||||
of running pods when choosing where to schedule new pods. Although
|
||||
Cluster Federation is still in design,
|
||||
we expect the metrics API described here to be sufficient. Cluster-level usage can be
|
||||
obtained by summing over usage of all nodes in the cluster.
|
||||
|
||||
|
Reference in New Issue
Block a user