diff --git a/docs/design/control-plane-resilience.md b/docs/design/control-plane-resilience.md
new file mode 100644
index 00000000000..8becccec19b
--- /dev/null
+++ b/docs/design/control-plane-resilience.md
@@ -0,0 +1,269 @@
+
+
+
+
+
+
+
+
+
+
+
Control Plane Component | +Resilience Plan | +Current Status | +
API Server | ++ +Multiple stateless, self-hosted, self-healing API servers behind a HA +load balancer, built out by the default "kube-up" automation on GCE, +AWS and basic bare metal (BBM). Note that the single-host approach of +hving etcd listen only on localhost to ensure that onyl API server can +connect to it will no longer work, so alternative security will be +needed in the regard (either using firewall rules, SSL certs, or +something else). All necessary flags are currently supported to enable +SSL between API server and etcd (OpenShift runs like this out of the +box), but this needs to be woven into the "kube-up" and related +scripts. Detailed design of self-hosting and related bootstrapping +and catastrophic failure recovery will be detailed in a separate +design doc. + + | ++ +No scripted self-healing or HA on GCE, AWS or basic bare metal +currently exists in the OSS distro. To be clear, "no self healing" +means that even if multiple e.g. API servers are provisioned for HA +purposes, if they fail, nothing replaces them, so eventually the +system will fail. Self-healing and HA can be set up +manually by following documented instructions, but this is not +currently an automated process, and it is not tested as part of +continuous integration. So it's probably safest to assume that it +doesn't actually work in practise. + + | +
Controller manager and scheduler | ++ +Multiple self-hosted, self healing warm standby stateless controller +managers and schedulers with leader election and automatic failover of API server +clients, automatically installed by default "kube-up" automation. + + | +As above. | +
etcd | +
+
+Multiple (3-5) etcd quorum members behind a load balancer with session
+affinity (to prevent clients from being bounced from one to another).
+
+Regarding self-healing, if a node running etcd goes down, it is always necessary to do three
+things:
+
|
++ +Somewhat vague instructions exist +on how to set some of this up manually in a self-hosted +configuration. But automatic bootstrapping and self-healing is not +described (and is not implemented for the non-PD cases). This all +still needs to be automated and continuously tested. + | +
Name + |
+Description + |
+Required + |
+Schema + |
+Default + |
+
Address + |
+address of the cluster + |
+yes + |
+address + |
++ |
Credential + |
+the type (e.g. bearer token, client
+certificate etc) and data of the credential used to access cluster. It’s used for system routines (not behalf of users) + |
+yes + |
+string + |
++ |
Name + |
+Description + |
+Required + |
+Schema + |
+Default + |
+
Phase + |
+the recently observed lifecycle phase of the cluster + |
+yes + |
+enum + |
++ |
Capacity + |
+represents the available resources of a cluster + |
+yes + |
+any + |
++ |
ClusterMeta + |
+Other cluster metadata like the version + |
+yes + |
+ClusterMeta + |
++ |