Merge pull request #39123 from michelleN/docs-proposals-stubs

replace contents of docs/proposals with stubs
2016-12-21 21:31:55 -08:00 · 2016-12-21 21:31:55 -08:00 · 41e6357a07
commit 41e6357a07
parent f0125ef1b1 ed10c2332f
68 changed files with 68 additions and 16836 deletions
--- a/docs/proposals/api-group.md
+++ b/docs/proposals/api-group.md
@ -1,119 +1 @@
-# Supporting multiple API groups
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-group.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-group.md)
 ## Goal
 1. Breaking the monolithic v1 API into modular groups and allowing groups to be enabled/disabled individually. This allows us to break the monolithic API server to smaller components in the future.
 2. Supporting different versions in different groups. This allows different groups to evolve at different speed.
 3. Supporting identically named kinds to exist in different groups. This is useful when we experiment new features of an API in the experimental group while supporting the stable API in the original group at the same time.
 4. Exposing the API groups and versions supported by the server. This is required to develop a dynamic client.
 5. Laying the basis for [API Plugin](../../docs/design/extending-api.md).
 6. Keeping the user interaction easy. For example, we should allow users to omit group name when using kubectl if there is no ambiguity.
 ## Bookkeeping for groups
 1. No changes to TypeMeta:
  Currently many internal structures, such as RESTMapper and Scheme, are indexed and retrieved by APIVersion. For a fast implementation targeting the v1.1 deadline, we will concatenate group with version, in the form of "group/version", and use it where a version string is expected, so that many code can be reused. This implies we will not add a new field to TypeMeta, we will use TypeMeta.APIVersion to hold "group/version".
  For backward compatibility, v1 objects belong to the group with an empty name, so existing v1 config files will remain valid.
 2. /pkg/conversion#Scheme:
  The key of /pkg/conversion#Scheme.versionMap for versioned types will be "group/version". For now, the internal version types of all groups will be registered to versionMap[""], as we don't have any identically named kinds in different groups yet. In the near future, internal version types will be registered to versionMap["group/"], and pkg/conversion#Scheme.InternalVersion will have type []string.
  We will need a mechanism to express if two kinds in different groups (e.g., compute/pods and experimental/pods) are convertible, and auto-generate the conversions if they are.
 3. meta.RESTMapper:
  Each group will have its own RESTMapper (of type DefaultRESTMapper), and these mappers will be registered to pkg/api#RESTMapper (of type MultiRESTMapper).
  To support identically named kinds in different groups, We need to expand the input of RESTMapper.VersionAndKindForResource from (resource string) to (group, resource string). If group is not specified and there is ambiguity (i.e., the resource exists in multiple groups), an error should be returned to force the user to specify the group.
 ## Server-side implementation
 1. resource handlers' URL:
  We will force the URL to be in the form of prefix/group/version/...
  Prefix is used to differentiate API paths from other paths like /healthz. All groups will use the same prefix="apis", except when backward compatibility requires otherwise. No "/" is allowed in prefix, group, or version. Specifically,
    * for /api/v1, we set the prefix="api" (which is populated from cmd/kube-apiserver/app#APIServer.APIPrefix), group="", version="v1", so the URL remains to be /api/v1.
    * for new kube API groups, we will set the prefix="apis" (we will add a field in type APIServer to hold this prefix), group=GROUP_NAME, version=VERSION. For example, the URL of the experimental resources will be /apis/experimental/v1alpha1.
    * for OpenShift v1 API, because it's currently registered at /oapi/v1, to be backward compatible, OpenShift may set prefix="oapi", group="".
    * for other new third-party API, they should also use the prefix="apis" and choose the group and version. This can be done through the thirdparty API plugin mechanism in [13000](http://pr.k8s.io/13000).
 2. supporting API discovery:
  * At /prefix (e.g., /apis), API server will return the supported groups and their versions using pkg/api/unversioned#APIVersions type, setting the Versions field to "group/version". This is backward compatible, because currently API server does return "v1" encoded in pkg/api/unversioned#APIVersions at /api. (We will also rename the JSON field name from `versions` to `apiVersions`, to be consistent with pkg/api#TypeMeta.APIVersion field)
  * At /prefix/group, API server will return all supported versions of the group. We will create a new type VersionList (name is open to discussion) in pkg/api/unversioned as the API.
  * At /prefix/group/version, API server will return all supported resources in this group, and whether each resource is namespaced. We will create a new type APIResourceList (name is open to discussion) in pkg/api/unversioned as the API.
  We will design how to handle deeper path in other proposals.
  * At /swaggerapi/swagger-version/prefix/group/version, API server will return the Swagger spec of that group/version in `swagger-version` (e.g. we may support both Swagger v1.2 and v2.0).
 3. handling common API objects:
  * top-level common API objects:
    To handle the top-level API objects that are used by all groups, we either have to register them to all schemes, or we can choose not to encode them to a version. We plan to take the latter approach and place such types in a new package called `unversioned`, because many of the common top-level objects, such as APIVersions, VersionList, and APIResourceList, which are used in the API discovery, and pkg/api#Status, are part of the protocol between client and server, and do not belong to the domain-specific parts of the API, which will evolve independently over time.
    Types in the unversioned package will not have the APIVersion field, but may retain the Kind field.
    For backward compatibility, when handling the Status, the server will encode it to v1 if the client expects the Status to be encoded in v1, otherwise the server will send the unversioned#Status. If an error occurs before the version can be determined, the server will send the unversioned#Status.
  * non-top-level common API objects:
    Assuming object o belonging to group X is used as a field in an object belonging to group Y, currently genconversion will generate the conversion functions for o in package Y. Hence, we don't need any special treatment for non-top-level common API objects.
    TypeMeta is an exception, because it is a common object that is used by objects in all groups but does not logically belong to any group. We plan to move it to the package `unversioned`.
 ## Client-side implementation
 1. clients:
  Currently we have structured (pkg/client/unversioned#ExperimentalClient, pkg/client/unversioned#Client) and unstructured (pkg/kubectl/resource#Helper) clients. The structured clients are not scalable because each of them implements specific interface, e.g., `[here]../../pkg/client/unversioned/client.go#L32`--fixed. Only the unstructured clients are scalable. We should either auto-generate the code for structured clients or migrate to use the unstructured clients as much as possible.
  We should also move the unstructured client to pkg/client/.
 2. Spelling the URL:
  The URL is in the form of prefix/group/version/. The prefix is hard-coded in the client/unversioned.Config. The client should be able to figure out `group` and `version` using the RESTMapper. For a third-party client which does not have access to the RESTMapper, it should discover the mapping of `group`, `version` and `kind` by querying the server as described in point 2 of #server-side-implementation.
 3. kubectl:
  kubectl should accept arguments like `group/resource`, `group/resource/name`. Nevertheless, the user can omit the `group`, then kubectl shall rely on RESTMapper.VersionAndKindForResource() to figure out the default group/version of the resource. For example, for resources (like `node`) that exist in both k8s v1 API and k8s modularized API (like `infra/v2`), we should set kubectl default to use one of them. If there is no default group, kubectl should return an error for the ambiguity.
  When kubectl is used with a single resource type, the --api-version and --output-version flag of kubectl should accept values in the form of `group/version`, and they should work as they do today. For multi-resource operations, we will disable these two flags initially.
  Currently, by setting pkg/client/unversioned/clientcmd/api/v1#Config.NamedCluster[x].Cluster.APIVersion ([here](../../pkg/client/unversioned/clientcmd/api/v1/types.go#L58)), user can configure the default apiVersion used by kubectl to talk to server. It does not make sense to set a global version used by kubectl when there are multiple groups, so we plan to deprecate this field. We may extend the version negotiation function to negotiate the preferred version of each group. Details will be in another proposal.
 ## OpenShift integration
 OpenShift can take a similar approach to break monolithic v1 API: keeping the v1 where they are, and gradually adding groups.
 For the v1 objects in OpenShift, they should keep doing what they do now: they should remain registered to Scheme.versionMap["v1"] scheme, they should keep being added to originMapper.
 For new OpenShift groups, they should do the same as native Kubernetes groups would do: each group should register to Scheme.versionMap["group/version"], each should has separate RESTMapper and the register the MultiRESTMapper.
 To expose a list of the supported Openshift groups to clients, OpenShift just has to call to pkg/cmd/server/origin#call initAPIVersionRoute() as it does now, passing in the supported "group/versions" instead of "versions".
 ## Future work
 1. Dependencies between groups: we need an interface to register the dependencies between groups. It is not our priority now as the use cases are not clear yet.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/api-group.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/apiserver-watch.md
+++ b/docs/proposals/apiserver-watch.md
@ -1,145 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/apiserver-watch.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/apiserver-watch.md)
 In the current system, most watch requests sent to apiserver are redirected to
 etcd. This means that for every watch request the apiserver opens a watch on
 etcd.
 The purpose of the proposal is to improve the overall performance of the system
 by solving the following problems:
 - having too many open watches on etcd
 - avoiding deserializing/converting the same objects multiple times in different
 watch results
 In the future, we would also like to add an indexing mechanism to the watch.
 Although Indexer is not part of this proposal, it is supposed to be compatible
 with it - in the future Indexer should be incorporated into the proposed new
 watch solution in apiserver without requiring any redesign.
 ## High level design
 We are going to solve those problems by allowing many clients to watch the same
 storage in the apiserver, without being redirected to etcd.
 At the high level, apiserver will have a single watch open to etcd, watching all
 the objects (of a given type) without any filtering. The changes delivered from
 etcd will then be stored in a cache in apiserver. This cache is in fact a
 "rolling history window" that will support clients having some amount of latency
 between their list and watch calls. Thus it will have a limited capacity and
 whenever a new change comes from etcd when a cache is full, the oldest change
 will be remove to make place for the new one.
 When a client sends a watch request to apiserver, instead of redirecting it to
 etcd, it will cause:
  - registering a handler to receive all new changes coming from etcd
  - iterating though a watch window, starting at the requested resourceVersion
    to the head and sending filtered changes directory to the client, blocking
    the above until this iteration has caught up
 This will be done be creating a go-routine per watcher that will be responsible
 for performing the above.
 The following section describes the proposal in more details, analyzes some
 corner cases and divides the whole design in more fine-grained steps.
 ## Proposal details
 We would like the cache to be __per-resource-type__ and __optional__. Thanks to
 it we will be able to:
  - have different cache sizes for different resources (e.g. bigger cache
    [= longer history] for pods, which can significantly affect performance)
  - avoid any overhead for objects that are watched very rarely (e.g. events
    are almost not watched at all, but there are a lot of them)
  - filter the cache for each watcher more effectively
 If we decide to support watches spanning different resources in the future and
 we have an efficient indexing mechanisms, it should be relatively simple to unify
 the cache to be common for all the resources.
 The rest of this section describes the concrete steps that need to be done
 to implement the proposal.
 1. Since we want the watch in apiserver to be optional for different resource
 types, this needs to be self-contained and hidden behind a well defined API.
 This should be a layer very close to etcd - in particular all registries:
 "pkg/registry/generic/registry" should be built on top of it.
 We will solve it by turning tools.EtcdHelper by extracting its interface
 and treating this interface as this API - the whole watch mechanisms in
 apiserver will be hidden behind that interface.
 Thanks to it we will get an initial implementation for free and we will just
 need to reimplement few relevant functions (probably just Watch and List).
 Moreover, this will not require any changes in other parts of the code.
 This step is about extracting the interface of tools.EtcdHelper.
 2. Create a FIFO cache with a given capacity. In its "rolling history window"
 we will store two things:
  - the resourceVersion of the object (being an etcdIndex)
  - the object watched from etcd itself (in a deserialized form)
  This should be as simple as having an array an treating it as a cyclic buffer.
  Obviously resourceVersion of objects watched from etcd will be increasing, but
  they are necessary for registering a new watcher that is interested in all the
  changes since a given etcdIndex.
  Additionally, we should support LIST operation, otherwise clients can never
  start watching at now. We may consider passing lists through etcd, however
  this will not work once we have Indexer, so we will need that information
  in memory anyway.
  Thus, we should support LIST operation from the "end of the history" - i.e.
  from the moment just after the newest cached watched event. It should be
  pretty simple to do, because we can incrementally update this list whenever
  the new watch event is watched from etcd.
  We may consider reusing existing structures cache.Store or cache.Indexer
  ("pkg/client/cache") but this is not a hard requirement.
 3. Create the new implementation of the API, that will internally have a
 single watch open to etcd and will store the data received from etcd in
 the FIFO cache - this includes implementing registration of a new watcher
 which will start a new go-routine responsible for iterating over the cache
 and sending all the objects watcher is interested in (by applying filtering
 function) to the watcher.
 4. Add a support for processing "error too old" from etcd, which will require:
  - disconnect all the watchers
  - clear the internal cache and relist all objects from etcd
  - start accepting watchers again
 5. Enable watch in apiserver for some of the existing resource types - this
 should require only changes at the initialization level.
 6. The next step will be to incorporate some indexing mechanism, but details
 of it are TBD.
 ### Future optimizations:
 1. The implementation of watch in apiserver internally will open a single
 watch to etcd, responsible for watching all the changes of objects of a given
 resource type. However, this watch can potentially expire at any time and
 reconnecting can return "too old resource version". In that case relisting is
 necessary. In such case, to avoid LIST requests coming from all watchers at
 the same time, we can introduce an additional etcd event type:
 [EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36)
  Whenever relisting will be done to refresh the internal watch to etcd,
  EtcdResync event will be send to all the watchers. It will contain the
  full list of all the objects the watcher is interested in (appropriately
  filtered) as the parameter of this watch event.
  Thus, we need to create the EtcdResync event, extend watch.Interface and
  its implementations to support it and handle those events appropriately
  in places like
  [Reflector](../../pkg/client/cache/reflector.go)
  However, this might turn out to be unnecessary optimization if apiserver
  will always keep up (which is possible in the new design). We will work
  out all necessary details at that point.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver-watch.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/apparmor.md
+++ b/docs/proposals/apparmor.md
@ -1,310 +1 @@
-<!-- BEGIN MUNGE: GENERATED_TOC -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/apparmor.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/apparmor.md)
 - [Overview](#overview)
  - [Motivation](#motivation)
  - [Related work](#related-work)
 - [Alpha Design](#alpha-design)
  - [Overview](#overview-1)
  - [Prerequisites](#prerequisites)
  - [API Changes](#api-changes)
    - [Pod Security Policy](#pod-security-policy)
  - [Deploying profiles](#deploying-profiles)
  - [Testing](#testing)
 - [Beta Design](#beta-design)
  - [API Changes](#api-changes-1)
 - [Future work](#future-work)
  - [System component profiles](#system-component-profiles)
  - [Deploying profiles](#deploying-profiles-1)
  - [Custom app profiles](#custom-app-profiles)
  - [Security plugins](#security-plugins)
  - [Container Runtime Interface](#container-runtime-interface)
  - [Alerting](#alerting)
  - [Profile authoring](#profile-authoring)
 - [Appendix](#appendix)
 <!-- END MUNGE: GENERATED_TOC -->
 # Overview
 AppArmor is a [mandatory access control](https://en.wikipedia.org/wiki/Mandatory_access_control)
 (MAC) system for Linux that supplements the standard Linux user and group based
 permissions. AppArmor can be configured for any application to reduce the potential attack surface
 and provide greater [defense in depth](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)).
 It is configured through profiles tuned to whitelist the access needed by a specific program or
 container, such as Linux capabilities, network access, file permissions, etc. Each profile can be
 run in either enforcing mode, which blocks access to disallowed resources, or complain mode, which
 only reports violations.
 AppArmor is similar to SELinux. Both are MAC systems implemented as a Linux security module (LSM),
 and are mutually exclusive. SELinux offers a lot of power and very fine-grained controls, but is
 generally considered very difficult to understand and maintain. AppArmor sacrifices some of that
 flexibility in favor of ease of use. Seccomp-bpf is another Linux kernel security feature for
 limiting attack surface, and can (and should!) be used alongside AppArmor.
 ## Motivation
 AppArmor can enable users to run a more secure deployment, and / or provide better auditing and
 monitoring of their systems. Although it is not the only solution, we should enable AppArmor for
 users that want a simpler alternative to SELinux, or are already maintaining a set of AppArmor
 profiles. We have heard from multiple Kubernetes users already that AppArmor support is important to
 them. The [seccomp proposal](../../docs/design/seccomp.md#use-cases) details several use cases that
 also apply to AppArmor.
 ## Related work
 Much of this design is drawn from the work already done to support seccomp profiles in Kubernetes,
 which is outlined in the [seccomp design doc](../../docs/design/seccomp.md). The designs should be
 kept close to apply lessons learned, and reduce cognitive and maintenance overhead.
 Docker has supported AppArmor profiles since version 1.3, and maintains a default profile which is
 applied to all containers on supported systems.
 AppArmor was upstreamed into the Linux kernel in version 2.6.36. It is currently maintained by
 [Canonical](http://www.canonical.com/), is shipped by default on all Ubuntu and openSUSE systems,
 and is supported on several
 [other distributions](http://wiki.apparmor.net/index.php/Main_Page#Distributions_and_Ports).
 # Alpha Design
 This section describes the proposed design for
 [alpha-level](../../docs/devel/api_changes.md#alpha-beta-and-stable-versions) support, although
 additional features are described in [future work](#future-work). For AppArmor alpha support
 (targeted for Kubernetes 1.4) we will enable:
 - Specifying a pre-loaded profile to apply to a pod container
 - Restricting pod containers to a set of profiles (admin use case)
 We will also provide a reference implementation of a pod for loading profiles on nodes, but an
 official supported mechanism for deploying profiles is out of scope for alpha.
 ## Overview
 An AppArmor profile can be specified for a container through the Kubernetes API with a pod
 annotation. If a profile is specified, the Kubelet will verify that the node meets the required
 [prerequisites](#prerequisites) (e.g. the profile is already configured on the node) before starting
 the container, and will not run the container if the profile cannot be applied. If the requirements
 are met, the container runtime will configure the appropriate options to apply the profile. Profile
 requirements and defaults can be specified on the
 [PodSecurityPolicy](security-context-constraints.md).
 ## Prerequisites
 When an AppArmor profile is specified, the Kubelet will verify the prerequisites for applying the
 profile to the container. In order to [fail
 securely](https://www.owasp.org/index.php/Fail_securely), a container **will not be run** if any of
 the prerequisites are not met. The prerequisites are:
 1. **Kernel support** - The AppArmor kernel module is loaded. Can be checked by
   [libcontainer](https://github.com/opencontainers/runc/blob/4dedd0939638fc27a609de1cb37e0666b3cf2079/libcontainer/apparmor/apparmor.go#L17).
 2. **Runtime support** - For the initial implementation, Docker will be required (rkt does not
   currently have AppArmor support). All supported Docker versions include AppArmor support. See
   [Container Runtime Interface](#container-runtime-interface) for other runtimes.
 3. **Installed profile** - The target profile must be loaded prior to starting the container. Loaded
   profiles can be found in the AppArmor securityfs \[1\].
 If any of the prerequisites are not met an event will be generated to report the error and the pod
 will be
 [rejected](https://github.com/kubernetes/kubernetes/blob/cdfe7b7b42373317ecd83eb195a683e35db0d569/pkg/kubelet/kubelet.go#L2201)
 by the Kubelet.
 *[1] The securityfs can be found in `/proc/mounts`, and defaults to `/sys/kernel/security` on my
 Ubuntu system. The profiles can be found at `{securityfs}/apparmor/profiles`
 ([example](http://bazaar.launchpad.net/~apparmor-dev/apparmor/master/view/head:/utils/aa-status#L137)).*
 ## API Changes
 The initial alpha support of AppArmor will follow the pattern
 [used by seccomp](https://github.com/kubernetes/kubernetes/pull/25324) and specify profiles through
 annotations. Profiles can be specified per-container through pod annotations. The annotation format
 is a key matching the container, and a profile name value:
 ```
 container.apparmor.security.alpha.kubernetes.io/<container_name>=<profile_name>
 ```
 The profiles can be specified in the following formats (following the convention used by [seccomp](../../docs/design/seccomp.md#api-changes)):
 1. `runtime/default` - Applies the default profile for the runtime. For docker, the profile is
   generated from a template
   [here](https://github.com/docker/docker/blob/master/profiles/apparmor/template.go). If no
   AppArmor annotations are provided, this profile is enabled by default if AppArmor is enabled in
   the kernel. Runtimes may define this to be unconfined, as Docker does for privileged pods.
 2. `localhost/<profile_name>` - The profile name specifies the profile to load.
 *Note: There is no way to explicitly specify an "unconfined" profile, since it is discouraged. If
 this is truly needed, the user can load an "allow-all" profile.*
 ### Pod Security Policy
 The [PodSecurityPolicy](security-context-constraints.md) allows cluster administrators to control
 the security context for a pod and its containers. An annotation can be specified on the
 PodSecurityPolicy to restrict which AppArmor profiles can be used, and specify a default if no
 profile is specified.
 The annotation key is `apparmor.security.alpha.kubernetes.io/allowedProfileNames`.  The value is a
 comma delimited list, with each item following the format described [above](#api-changes). If a list
 of profiles are provided and a pod does not have an AppArmor annotation, the first profile in the
 list will be used by default.
 Enforcement of the policy is standard. See the
 [seccomp implementation](https://github.com/kubernetes/kubernetes/pull/28300) as an example.
 ## Deploying profiles
 We will provide a reference implementation of a DaemonSet pod for loading profiles on nodes, but
 there will not be an official mechanism or API in the initial version (see
 [future work](#deploying-profiles-1)).  The reference container will contain the `apparmor_parser`
 tool and a script for using the tool to load all profiles in a set of (configurable)
 directories. The initial implementation will poll (with a configurable interval) the directories for
 additions, but will not update or unload existing profiles. The pod can be run in a DaemonSet to
 load the profiles onto all nodes. The pod will need to be run in privileged mode.
 This simple design should be sufficient to deploy AppArmor profiles from any volume source, such as
 a ConfigMap or PersistentDisk. Users seeking more advanced features should be able extend this
 design easily.
 ## Testing
 Our e2e testing framework does not currently run nodes with AppArmor enabled, but we can run a node
 e2e test suite on an AppArmor enabled node. The cases we should test are:
 - *PodSecurityPolicy* - These tests can be run on a cluster even if AppArmor is not enabled on the
  nodes.
  - No AppArmor policy allows pods with arbitrary profiles
  - With a policy a default is selected
  - With a policy arbitrary profiles are prevented
  - With a policy allowed profiles are allowed
 - *Node AppArmor enforcement* - These tests need to run on AppArmor enabled nodes, in the node e2e
  suite.
  - A valid container profile gets applied
  - An unloaded profile will be rejected
 # Beta Design
 The only part of the design that changes for beta is the API, which is upgraded from
 annotation-based to first class fields.
 ## API Changes
 AppArmor profiles will be specified in the container's SecurityContext, as part of an
 `AppArmorOptions` struct. The options struct makes the API more flexible to future additions.
 ```go
 type SecurityContext struct {
    ...
    // The AppArmor options to be applied to the container.
    AppArmorOptions *AppArmorOptions `json:"appArmorOptions,omitempty"`
    ...
 }
 // Reference to an AppArmor profile loaded on the host.
 type AppArmorProfileName string
 // Options specifying how to run Containers with AppArmor.
 type AppArmorOptions struct {
    // The profile the Container must be run with.
    Profile AppArmorProfileName `json:"profile"`
 }
 ```
 The `AppArmorProfileName` format matches the format for the profile annotation values describe
 [above](#api-changes).
 The `PodSecurityPolicySpec` receives a similar treatment with the addition of an
 `AppArmorStrategyOptions` struct. Here the `DefaultProfile` is separated from the `AllowedProfiles`
 in the interest of making the behavior more explicit.
 ```go
 type PodSecurityPolicySpec struct {
    ...
    AppArmorStrategyOptions *AppArmorStrategyOptions `json:"appArmorStrategyOptions,omitempty"`
    ...
 }
 // AppArmorStrategyOptions specifies AppArmor restrictions and requirements for pods and containers.
 type AppArmorStrategyOptions struct {
    // If non-empty, all pod containers must be run with one of the profiles in this list.
    AllowedProfiles []AppArmorProfileName `json:"allowedProfiles,omitempty"`
    // The default profile to use if a profile is not specified for a container.
    // Defaults to "runtime/default". Must be allowed by AllowedProfiles.
    DefaultProfile AppArmorProfileName `json:"defaultProfile,omitempty"`
 }
 ```
 # Future work
 Post-1.4 feature ideas. These are not fully-fleshed designs.
 ## System component profiles
 We should publish (to GitHub) AppArmor profiles for all Kubernetes system components, including core
 components like the API server and controller manager, as well as addons like influxDB and
 Grafana. `kube-up.sh` and its successor should have an option to apply the profiles, if the AppArmor
 is supported by the nodes. Distros that support AppArmor and provide a Kubernetes package should
 include the profiles out of the box.
 ## Deploying profiles
 We could provide an official supported solution for loading profiles on the nodes. One option is to
 extend the reference implementation described [above](#deploying-profiles) into a DaemonSet that
 watches the directory sources to sync changes, or to watch a ConfigMap object directly.  Another
 option is to add an official API for this purpose, and load the profiles on-demand in the Kubelet.
 ## Custom app profiles
 [Profile stacking](http://wiki.apparmor.net/index.php/AppArmorStacking) is an AppArmor feature
 currently in development that will enable multiple profiles to be applied to the same object. If
 profiles are stacked, the allowed set of operations is the "intersection" of both profiles
 (i.e. stacked profiles are never more permissive). Taking advantage of this feature, the cluster
 administrator could restrict the allowed profiles on a PodSecurityPolicy to a few broad profiles,
 and then individual apps could apply more app specific profiles on top.
 ## Security plugins
 AppArmor, SELinux, TOMOYO, grsecurity, SMACK, etc. are all Linux MAC implementations with similar
 requirements and features. At the very least, the AppArmor implementation should be factored in a
 way that makes it easy to add alternative systems. A more advanced approach would be to extract a
 set of interfaces for plugins implementing the alternatives. An even higher level approach would be
 to define a common API or profile interface for all of them. Work towards this last option is
 already underway for Docker, called
 [Docker Security Profiles](https://github.com/docker/docker/issues/17142#issuecomment-148974642).
 ## Container Runtime Interface
 Other container runtimes will likely add AppArmor support eventually, so the
 [Container Runtime Interface](container-runtime-interface-v1.md) (CRI) needs to be made compatible
 with this design. The two important pieces are a way to report whether AppArmor is supported by the
 runtime, and a way to specify the profile to load (likely through the `LinuxContainerConfig`).
 ## Alerting
 Whether AppArmor is running in enforcing or complain mode it generates logs of policy
 violations. These logs can be important cues for intrusion detection, or at the very least a bug in
 the profile. Violations should almost always generate alerts in production systems. We should
 provide reference documentation for setting up alerts.
 ## Profile authoring
 A common method for writing AppArmor profiles is to start with a restrictive profile in complain
 mode, and then use the `aa-logprof` tool to build a profile from the logs. We should provide
 documentation for following this process in a Kubernetes environment.
 # Appendix
 - [What is AppArmor](https://askubuntu.com/questions/236381/what-is-apparmor)
 - [Debugging AppArmor on Docker](https://github.com/docker/docker/blob/master/docs/security/apparmor.md#debug-apparmor)
 - Load an AppArmor profile with `apparmor_parser` (required by Docker so it should be available):
  ```
  $ apparmor_parser --replace --write-cache /path/to/profile
  ```
 - Unload with:
  ```
  $ apparmor_parser --remove /path/to/profile
  ```
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apparmor.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/client-package-structure.md
+++ b/docs/proposals/client-package-structure.md
@ -1,316 +1 @@
-<!-- BEGIN MUNGE: GENERATED_TOC -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/client-package-structure.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/client-package-structure.md)
 - [Client: layering and package structure](#client-layering-and-package-structure)
  - [Desired layers](#desired-layers)
    - [Transport](#transport)
    - [RESTClient/request.go](#restclientrequestgo)
    - [Mux layer](#mux-layer)
    - [High-level: Individual typed](#high-level-individual-typed)
      - [High-level, typed: Discovery](#high-level-typed-discovery)
    - [High-level: Dynamic](#high-level-dynamic)
    - [High-level: Client Sets](#high-level-client-sets)
  - [Package Structure](#package-structure)
  - [Client Guarantees (and testing)](#client-guarantees-and-testing)
 <!-- END MUNGE: GENERATED_TOC -->
 # Client: layering and package structure
 ## Desired layers
 ### Transport
 The transport layer is concerned with round-tripping requests to an apiserver
 somewhere. It consumes a Config object with options appropriate for this.
 (That's most of the current client.Config structure.)
 Transport delivers an object that implements http's RoundTripper interface
 and/or can be used in place of http.DefaultTransport to route requests.
 Transport objects are safe for concurrent use, and are cached and reused by
 subsequent layers.
 Tentative name: "Transport".
 It's expected that the transport config will be general enough that third
 parties (e.g., OpenShift) will not need their own implementation, rather they
 can change the certs, token, etc., to be appropriate for their own servers,
 etc..
 Action items:
 * Split out of current client package into a new package. (@krousey)
 ### RESTClient/request.go
 RESTClient consumes a Transport and a Codec (and optionally a group/version),
 and produces something that implements the interface currently in request.go.
 That is, with a RESTClient, you can write chains of calls like:
 `c.Get().Path(p).Param("name", "value").Do()`
 RESTClient is generically usable by any client for servers exposing REST-like
 semantics. It provides helpers that benefit those following api-conventions.md,
 but does not mandate them. It provides a higher level http interface that
 abstracts transport, wire serialization, retry logic, and error handling.
 Kubernetes-like constructs that deviate from standard HTTP should be bypassable.
 Every non-trivial call made to a remote restful API from Kubernetes code should
 go through a rest client.
 The group and version may be empty when constructing a RESTClient. This is valid
 for executing discovery commands. The group and version may be overridable with
 a chained function call.
 Ideally, no semantic behavior is built into RESTClient, and RESTClient will use
 the Codec it was constructed with for all semantic operations, including turning
 options objects into URL query parameters. Unfortunately, that is not true of
 today's RESTClient, which may have some semantic information built in. We will
 remove this.
 RESTClient should not make assumptions about the format of data produced or
 consumed by the Codec. Currently, it is JSON, but we want to support binary
 protocols in the future.
 The Codec would look something like this:
 ```go
 type Codec interface {
  Encode(runtime.Object) ([]byte, error)
  Decode([]byte]) (runtime.Object, error)
  // Used to version-control query parameters
  EncodeParameters(optionsObject runtime.Object) (url.Values, error)
  // Not included here since the client doesn't need it, but a corresponding
  // DecodeParametersInto method would be available on the server.
 }
 ```
 There should be one codec per version. RESTClient is *not* responsible for
 converting between versions; if a client wishes, they can supply a Codec that
 does that. But RESTClient will make the assumption that it's talking to a single
 group/version, and will not contain any conversion logic. (This is a slight
 change from the current state.)
 As with Transport, it is expected that 3rd party providers following the api
 conventions should be able to use RESTClient, and will not need to implement
 their own.
 Action items:
 * Split out of the current client package. (@krousey)
 * Possibly, convert to an interface (currently, it's a struct). This will allow
  extending the error-checking monad that's currently in request.go up an
  additional layer.
 * Switch from ParamX("x") functions to using types representing the collection
  of parameters and the Codec for query parameter serialization.
 * Any other Kubernetes group specific behavior should also be removed from
  RESTClient.
 ### Mux layer
 (See TODO at end; this can probably be merged with the "client set" concept.)
 The client muxer layer has a map of group/version to cached RESTClient, and
 knows how to construct a new RESTClient in case of a cache miss (using the
 discovery client mentioned below). The ClientMux may need to deal with multiple
 transports pointing at differing destinations (e.g. OpenShift or other 3rd party
 provider API may be at a different location).
 When constructing a RESTClient generically, the muxer will just use the Codec
 the high-level dynamic client would use. Alternatively, the user should be able
 to pass in a Codec-- for the case where the correct types are compiled in.
 Tentative name: ClientMux
 Action items:
 * Move client cache out of kubectl libraries into a more general home.
 * TODO: a mux layer may not be necessary, depending on what needs to be cached.
  If transports are cached already, and RESTClients are extremely light-weight,
  there may not need to be much code at all in this layer.
 ### High-level: Individual typed
 Our current high-level client allows you to write things like
 `c.Pods("namespace").Create(p)`; we will insert a level for the group.
 That is, the system will be:
 `clientset.GroupName().NamespaceSpecifier().Action()`
 Where:
 * `clientset` is a thing that holds multiple individually typed clients (see
  below).
 * `GroupName()` returns the generated client that this section is about.
 * `NamespaceSpecifier()` may take a namespace parameter or nothing.
 * `Action` is one of Create/Get/Update/Delete/Watch, or appropriate actions
  from the type's subresources.
 * It is TBD how we'll represent subresources and their actions. This is
  inconsistent in the current clients, so we'll need to define a consistent
  format. Possible choices:
 * Insert a `.Subresource()` before the `.Action()`
 * Flatten subresources, such that they become special Actions on the parent
   resource.
 The types returned/consumed by such functions will be e.g. api/v1, NOT the
 current version inspecific types. The current internal-versioned client is
 inconvenient for users, as it does not protect them from having to recompile
 their code with every minor update. (We may continue to generate an
 internal-versioned client for our own use for a while, but even for our own
 components it probably makes sense to switch to specifically versioned clients.)
 We will provide this structure for each version of each group. It is infeasible
 to do this manually, so we will generate this. The generator will accept both
 swagger and the ordinary go types. The generator should operate on out-of-tree
 sources AND out-of-tree destinations, so it will be useful for consuming
 out-of-tree APIs and for others to build custom clients into their own
 repositories.
 Typed clients will be constructable given a ClientMux; the typed constructor will use
 the ClientMux to find or construct an appropriate RESTClient. Alternatively, a
 typed client should be constructable individually given a config, from which it
 will be able to construct the appropriate RESTClient.
 Typed clients do not require any version negotiation. The server either supports
 the client's group/version, or it does not. However, there are ways around this:
 * If you want to use a typed client against a server's API endpoint and the
  server's API version doesn't match the client's API version, you can construct
  the client with a RESTClient using a Codec that does the conversion (this is
  basically what our client does now).
 * Alternatively, you could use the dynamic client.
 Action items:
 * Move current typed clients into new directory structure (described below)
 * Finish client generation logic. (@caesarxuchao, @lavalamp)
 #### High-level, typed: Discovery
 A `DiscoveryClient` is necessary to discover the api groups, versions, and
 resources a server supports. It's constructable given a RESTClient. It is
 consumed by both the ClientMux and users who want to iterate over groups,
 versions, or resources. (Example: namespace controller.)
 The DiscoveryClient is *not* required if you already know the group/version of
 the resource you want to use: you can simply try the operation without checking
 first, which is lower-latency anyway as it avoids an extra round-trip.
 Action items:
 * Refactor existing functions to present a sane interface, as close to that
  offered by the other typed clients as possible. (@caeserxuchao)
 * Use a RESTClient to make the necessary API calls.
 * Make sure that no discovery happens unless it is explicitly requested. (Make
  sure SetKubeDefaults doesn't call it, for example.)
 ### High-level: Dynamic
 The dynamic client lets users consume apis which are not compiled into their
 binary. It will provide the same interface as the typed client, but will take
 and return `runtime.Object`s instead of typed objects. There is only one dynamic
 client, so it's not necessary to generate it, although optionally we may do so
 depending on whether the typed client generator makes it easy.
 A dynamic client is constructable given a config, group, and version. It will
 use this to construct a RESTClient with a Codec which encodes/decodes to
 'Unstructured' `runtime.Object`s. The group and version may be from a previous
 invocation of a DiscoveryClient, or they may be known by other means.
 For now, the dynamic client will assume that a JSON encoding is allowed. In the
 future, if we have binary-only APIs (unlikely?), we can add that to the
 discovery information and construct an appropriate dynamic Codec.
 Action items:
 * A rudimentary version of this exists in kubectl's builder. It needs to be
  moved to a more general place.
 * Produce a useful 'Unstructured' runtime.Object, which allows for easy
  Object/ListMeta introspection.
 ### High-level: Client Sets
 Because there will be multiple groups with multiple versions, we will provide an
 aggregation layer that combines multiple typed clients in a single object.
 We do this to:
 * Deliver a concrete thing for users to consume, construct, and pass around. We
  don't want people making 10 typed clients and making a random system to keep
  track of them.
 * Constrain the testing matrix. Users can generate a client set at their whim
  against their cluster, but we need to make guarantees that the clients we
  shipped with v1.X.0 will work with v1.X+1.0, and vice versa. That's not
  practical unless we "bless" a particular version of each API group and ship an
  official client set with earch release. (If the server supports 15 groups with
  2 versions each, that's 2^15 different possible client sets. We don't want to
  test all of them.)
 A client set is generated into its own package. The generator will take the list
 of group/versions to be included. Only one version from each group will be in
 the client set.
 A client set is constructable at runtime from either a ClientMux or a transport
 config (for easy one-stop-shopping).
 An example:
 ```go
 import (
  api_v1 "k8s.io/kubernetes/pkg/client/typed/generated/v1"
  ext_v1beta1 "k8s.io/kubernetes/pkg/client/typed/generated/extensions/v1beta1"
  net_v1beta1 "k8s.io/kubernetes/pkg/client/typed/generated/net/v1beta1"
  "k8s.io/kubernetes/pkg/client/typed/dynamic"
 )
 type Client interface {
  API() api_v1.Client
  Extensions() ext_v1beta1.Client
  Net() net_v1beta1.Client
  // ... other typed clients here.
  // Included in every set
  Discovery() discovery.Client
  GroupVersion(group, version string) dynamic.Client
 }
 ```
 Note that a particular version is chosen for each group. It is a general rule
 for our API structure that no client need care about more than one version of
 each group at a time.
 This is the primary deliverable that people would consume. It is also generated.
 Action items:
 * This needs to be built. It will replace the ClientInterface that everyone
  passes around right now.
 ## Package Structure
 ```
 pkg/client/
 ----------/transport/     # transport & associated config
 ----------/restclient/
 ----------/clientmux/
 ----------/typed/
 ----------------/discovery/
 ----------------/generated/
 --------------------------/<group>/
 ----------------------------------/<version>/
 --------------------------------------------/<resource>.go
 ----------------/dynamic/
 ----------/clientsets/
 ---------------------/release-1.1/
 ---------------------/release-1.2/
 ---------------------/the-test-set-you-just-generated/
 ```
 `/clientsets/` will retain their contents until they reach their expire date.
 e.g., when we release v1.N, we'll remove clientset v1.(N-3). Clients from old
 releases live on and continue to work (i.e., are tested) without any interface
 changes for multiple releases, to give users time to transition.
 ## Client Guarantees (and testing)
 Once we release a clientset, we will not make interface changes to it. Users of
 that client will not have to change their code until they are deliberately
 upgrading their import. We probably will want to generate some sort of stub test
 with a clientset, to ensure that we don't change the interface.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/client-package-structure.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/cluster-deployment.md
+++ b/docs/proposals/cluster-deployment.md
@ -1,171 +1 @@
-# Objective
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cluster-deployment.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cluster-deployment.md)
 Simplify the cluster provisioning process for a cluster with one master and multiple worker nodes.
 It should be secured with SSL and have all the default add-ons. There should not be significant
 differences in the provisioning process across deployment targets (cloud provider + OS distribution)
 once machines meet the node specification.
 # Overview
 Cluster provisioning can be broken into a number of phases, each with their own exit criteria.
 In some cases, multiple phases will be combined together to more seamlessly automate the cluster setup,
 but in all cases the phases can be run sequentially to provision a functional cluster.
 It is possible that for some platforms we will provide an optimized flow that combines some of the steps
 together, but that is out of scope of this document.
 # Deployment flow
 **Note**: _Exit critieria_ in the following sections are not intended to list all tests that should pass,
 rather list those that must pass.
 ## Step 1: Provision cluster
 **Objective**: Create a set of machines (master + nodes) where we will deploy Kubernetes.
 For this phase to be completed successfully, the following requirements must be completed for all nodes:
 - Basic connectivity between nodes (i.e. nodes can all ping each other)
 - Docker installed (and in production setups should be monitored to be always running)
 - One of the supported OS
 We will provide a node specification conformance test that will verify if provisioning has been successful.
 This step is provider specific and will be implemented for each cloud provider + OS distribution separately
 using provider specific technology (cloud formation, deployment manager, PXE boot, etc).
 Some OS distributions may meet the provisioning criteria without needing to run any post-boot steps as they
 ship with all of the requirements for the node specification by default.
 **Substeps** (on the GCE example):
 1. Create network
 2. Create firewall rules to allow communication inside the cluster
 3. Create firewall rule to allow ```ssh``` to all machines
 4. Create firewall rule to allow ```https``` to master
 5. Create persistent disk for master
 6. Create static IP address for master
 7. Create master machine
 8. Create node machines
 9. Install docker on all machines
 **Exit critera**:
 1. Can ```ssh``` to all machines and run a test docker image
 2. Can ```ssh``` to master and nodes and ping other machines
 ## Step 2: Generate certificates
 **Objective**: Generate security certificates used to configure secure communication between client, master and nodes
 TODO: Enumerate certificates which have to be generated.
 ## Step 3: Deploy master
 **Objective**: Run kubelet and all the required components (e.g. etcd, apiserver, scheduler, controllers) on the master machine.
 **Substeps**:
 1. copy certificates
 2. copy manifests for static pods:
 	1. etcd
 	2. apiserver, controller manager, scheduler
 3. run kubelet in docker container (configuration is read from apiserver Config object)
 4. run kubelet-checker in docker container
 **v1.2 simplifications**:
 1. kubelet-runner.sh - we will provide a custom docker image to run kubelet; it will contain
 kubelet binary and will run it using ```nsenter``` to workaround problem with mount propagation
 1. kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will
 be generated locally and copied to all nodes.
 **Exit criteria**:
 1. Can run basic API calls (e.g. create, list and delete pods) from the client side (e.g. replication
 controller works - user can create RC object and RC manager can create pods based on that)
 2. Critical master components works:
  1. scheduler
  2. controller manager
 ## Step 4: Deploy nodes
 **Objective**: Start kubelet on all nodes and configure kubernetes network.
 Each node can be deployed separately and the implementation should make it ~impossible to change this assumption.
 ### Step 4.1: Run kubelet
 **Substeps**:
 1. copy certificates
 2. run kubelet in docker container (configuration is read from apiserver Config object)
 3. run kubelet-checker in docker container
 **v1.2 simplifications**:
 1. kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will
 be generated locally and copied to all nodes.
 **Exit critera**:
 1. All nodes are registered, but not ready due to lack of kubernetes networking.
 ### Step 4.2: Setup kubernetes networking
 **Objective**: Configure the Kubernetes networking to allow routing requests to pods and services.
 To keep default setup consistent across open source deployments we will use Flannel to configure
 kubernetes networking. However, implementation of this step will allow to easily plug in different
 network solutions.
 **Substeps**:
 1. copy manifest for flannel server to master machine
 2. create a daemonset with flannel daemon (it will read assigned CIDR and configure network appropriately).
 **v1.2 simplifications**:
 1. flannel daemon will run as a standalone binary (not in docker container)
 2. flannel server will assign CIDRs to nodes outside of kubernetes; this will require restarting kubelet
 after reconfiguring network bridge on local machine; this will also require running master nad node differently
 (```--configure-cbr0=false``` on node and ```--allocate-node-cidrs=false``` on master), which breaks encapsulation
 between nodes
 **Exit criteria**:
 1. Pods correctly created, scheduled, run and accessible from all nodes.
 ## Step 5: Add daemons
 **Objective:** Start all system daemons (e.g. kube-proxy)
 **Substeps:**:
 1. Create daemonset for kube-proxy
 **Exit criteria**:
 1. Services work correctly on all nodes.
 ## Step 6: Add add-ons
 **Objective**: Add default add-ons (e.g. dns, dashboard)
 **Substeps:**:
 1. Create Deployments (and daemonsets if needed) for all add-ons
 ## Deployment technology
 We will use Ansible as the default technology for deployment orchestration. It has low requirements on the cluster machines
 and seems to be popular in kubernetes community which will help us to maintain it.
 For simpler UX we will provide simple bash scripts that will wrap all basic commands for deployment (e.g. ```up``` or ```down```)
 One disadvantage of using Ansible is that it adds a dependency on a machine which runs deployment scripts. We will workaround
 this by distributing deployment scripts via a docker image so that user will run the following command to create a cluster:
 ```docker run gcr.io/google_containers/deploy_kubernetes:v1.2 up --num-nodes=3 --provider=aws```
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/cluster-deployment.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/container-init.md
+++ b/docs/proposals/container-init.md
@ -1,444 +1 @@
-# Pod initialization
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/container-init.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/container-init.md)
@smarterclayton
 March 2016
 ## Proposal and Motivation
 Within a pod there is a need to initialize local data or adapt to the current
 cluster environment that is not easily achieved in the current container model.
 Containers start in parallel after volumes are mounted, leaving no opportunity
 for coordination between containers without specialization of the image. If
 two containers need to share common initialization data, both images must
 be altered to cooperate using filesystem or network semantics, which introduces
 coupling between images. Likewise, if an image requires configuration in order
 to start and that configuration is environment dependent, the image must be
 altered to add the necessary templating or retrieval.
 This proposal introduces the concept of an **init container**, one or more
 containers started in sequence before the pod's normal containers are started.
 These init containers may share volumes, perform network operations, and perform
 computation prior to the start of the remaining containers. They may also, by
 virtue of their sequencing, block or delay the startup of application containers
 until some precondition is met. In this document we refer to the existing pod
 containers as **app containers**.
 This proposal also provides a high level design of **volume containers**, which
 initialize a particular volume, as a feature that specializes some of the tasks
 defined for init containers. The init container design anticipates the existence
 of volume containers and highlights where they will take future work
 ## Design Points
 * Init containers should be able to:
  * Perform initialization of shared volumes
    * Download binaries that will be used in app containers as execution targets
    * Inject configuration or extension capability to generic images at startup
    * Perform complex templating of information available in the local environment
    * Initialize a database by starting a temporary execution process and applying
      schema info.
  * Delay the startup of application containers until preconditions are met
  * Register the pod with other components of the system
 * Reduce coupling:
  * Between application images, eliminating the need to customize those images for
    Kubernetes generally or specific roles
  * Inside of images, by specializing which containers perform which tasks
    (install git into init container, use filesystem contents
    in web container)
  * Between initialization steps, by supporting multiple sequential init containers
 * Init containers allow simple start preconditions to be implemented that are
  decoupled from application code
  * The order init containers start should be predictable and allow users to easily
    reason about the startup of a container
  * Complex ordering and failure will not be supported - all complex workflows can
    if necessary be implemented inside of a single init container, and this proposal
    aims to enable that ordering without adding undue complexity to the system.
    Pods in general are not intended to support DAG workflows.
 * Both run-once and run-forever pods should be able to use init containers
 * As much as possible, an init container should behave like an app container
  to reduce complexity for end users, for clients, and for divergent use cases.
  An init container is a container with the minimum alterations to accomplish
  its goal.
 * Volume containers should be able to:
  * Perform initialization of a single volume
  * Start in parallel
  * Perform computation to initialize a volume, and delay start until that
    volume is initialized successfully.
  * Using a volume container that does not populate a volume to delay pod start
    (in the absence of init containers) would be an abuse of the goal of volume
    containers.
 * Container pre-start hooks are not sufficient for all initialization cases:
  * They cannot easily coordinate complex conditions across containers
  * They can only function with code in the image or code in a shared volume,
    which would have to be statically linked (not a common pattern in wide use)
  * They cannot be implemented with the current Docker implementation - see
    [#140](https://github.com/kubernetes/kubernetes/issues/140)
 ## Alternatives
 * Any mechanism that runs user code on a node before regular pod containers
  should itself be a container and modeled as such - we explicitly reject
  creating new mechanisms for running user processes.
 * The container pre-start hook (not yet implemented) requires execution within
  the container's image and so cannot adapt existing images. It also cannot
  block startup of containers
 * Running a "pre-pod" would defeat the purpose of the pod being an atomic
  unit of scheduling.
 ## Design
 Each pod may have 0..N init containers defined along with the existing
 1..M app containers.
 On startup of the pod, after the network and volumes are initialized, the
 init containers are started in order. Each container must exit successfully
 before the next is invoked. If a container fails to start (due to the runtime)
 or exits with failure, it is retried according to the pod RestartPolicy.
 RestartPolicyNever pods will immediately fail and exit. RestartPolicyAlways
 pods will retry the failing init container with increasing backoff until it
 succeeds. To align with the design of application containers, init containers
 will only support "infinite retries" (RestartPolicyAlways) or "no retries"
 (RestartPolicyNever).
 A pod cannot be ready until all init containers have succeeded. The ports
 on an init container are not aggregated under a service. A pod that is
 being initialized is in the `Pending` phase but should have a distinct
 condition. Each app container and all future init containers should have
 the reason `PodInitializing`. The pod should have a condition `Initializing`
 set to `false` until all init containers have succeeded, and `true` thereafter.
 If the pod is restarted, the `Initializing` condition should be set to `false.
 If the pod is "restarted" all containers stopped and started due to
 a node restart, change to the pod definition, or admin interaction, all
 init containers must execute again. Restartable conditions are defined as:
 * An init container image is changed
 * The pod infrastructure container is restarted (shared namespaces are lost)
 * The Kubelet detects that all containers in a pod are terminated AND
  no record of init container completion is available on disk (due to GC)
 Changes to the init container spec are limited to the container image field.
 Altering the container image field is equivalent to restarting the pod.
 Because init containers can be restarted, retried, or reexecuted, container
 authors should make their init behavior idempotent by handling volumes that
 are already populated or the possibility that this instance of the pod has
 already contacted a remote system.
 Each init container has all of the fields of an app container. The following
 fields are prohibited from being used on init containers by validation:
 * `readinessProbe` - init containers must exit for pod startup to continue,
  are not included in rotation, and so cannot define readiness distinct from
  completion.
 Init container authors may use `activeDeadlineSeconds` on the pod and
 `livenessProbe` on the container to prevent init containers from failing
 forever. The active deadline includes init containers.
 Because init containers are semantically different in lifecycle from app
 containers (they are run serially, rather than in parallel), for backwards
 compatibility and design clarity they will be identified as distinct fields
 in the API:
    pod:
      spec:
        containers: ...
        initContainers:
        - name: init-container1
          image: ...
          ...
        - name: init-container2
        ...
      status:
        containerStatuses: ...
        initContainerStatuses:
        - name: init-container1
          ...
        - name: init-container2
          ...
 This separation also serves to make the order of container initialization
 clear - init containers are executed in the order that they appear, then all
 app containers are started at once.
 The name of each app and init container in a pod must be unique - it is a
 validation error for any container to share a name.
 While pod containers are in alpha state, they will be serialized as an annotation
 on the pod with the name `pod.alpha.kubernetes.io/init-containers` and the status
 of the containers will be stored as `pod.alpha.kubernetes.io/init-container-statuses`.
 Mutation of these annotations is prohibited on existing pods.
 ### Resources
 Given the ordering and execution for init containers, the following rules
 for resource usage apply:
 * The highest of any particular resource request or limit defined on all init
  containers is the **effective init request/limit**
 * The pod's **effective request/limit** for a resource is the higher of:
  * sum of all app containers request/limit for a resource
  * effective init request/limit for a resource
 * Scheduling is done based on effective requests/limits, which means
  init containers can reserve resources for initialization that are not used
  during the life of the pod.
 * The lowest QoS tier of init containers per resource is the **effective init QoS tier**,
  and the highest QoS tier of both init containers and regular containers is the
  **effective pod QoS tier**.
 So the following pod:
    pod:
      spec:
        initContainers:
        - limits:
            cpu: 100m
            memory: 1GiB
        - limits:
            cpu: 50m
            memory: 2GiB
        containers:
        - limits:
            cpu: 10m
            memory: 1100MiB
        - limits:
            cpu: 10m
            memory: 1100MiB
 has an effective pod limit of `cpu: 100m`, `memory: 2200MiB` (highest init
 container cpu is larger than sum of all app containers, sum of container
 memory is larger than the max of all init containers). The scheduler, node,
 and quota must respect the effective pod request/limit.
 In the absence of a defined request or limit on a container, the effective
 request/limit will be applied. For example, the following pod:
    pod:
      spec:
        initContainers:
        - limits:
            cpu: 100m
            memory: 1GiB
        containers:
        - request:
            cpu: 10m
            memory: 1100MiB
 will have an effective request of `10m / 1100MiB`, and an effective limit
 of `100m / 1GiB`, i.e.:
    pod:
      spec:
        initContainers:
        - request:
            cpu: 10m
            memory: 1GiB
        - limits:
            cpu: 100m
            memory: 1100MiB
        containers:
        - request:
            cpu: 10m
            memory: 1GiB
        - limits:
            cpu: 100m
            memory: 1100MiB
 and thus have the QoS tier **Burstable** (because request is not equal to
 limit).
 Quota and limits will be applied based on the effective pod request and
 limit.
 Pod level cGroups will be based on the effective pod request and limit, the
 same as the scheduler.
 ### Kubelet and container runtime details
 Container runtimes should treat the set of init and app containers as one
 large pool. An individual init container execution should be identical to
 an app container, including all standard container environment setup
 (network, namespaces, hostnames, DNS, etc).
 All app container operations are permitted on init containers. The
 logs for an init container should be available for the duration of the pod
 lifetime or until the pod is restarted.
 During initialization, app container status should be shown with the reason
 PodInitializing if any init containers are present. Each init container
 should show appropriate container status, and all init containers that are
 waiting for earlier init containers to finish should have the `reason`
 PendingInitialization.
 The container runtime should aggressively prune failed init containers.
 The container runtime should record whether all init containers have
 succeeded internally, and only invoke new init containers if a pod
 restart is needed (for Docker, if all containers terminate or if the pod
 infra container terminates). Init containers should follow backoff rules
 as necessary. The Kubelet *must* preserve at least the most recent instance
 of an init container to serve logs and data for end users and to track
 failure states. The Kubelet *should* prefer to garbage collect completed
 init containers over app containers, as long as the Kubelet is able to
 track that initialization has been completed. In the future, container
 state checkpointing in the Kubelet may remove or reduce the need to
 preserve old init containers.
 For the initial implementation, the Kubelet will use the last termination
 container state of the highest indexed init container to determine whether
 the pod has completed initialization. During a pod restart, initialization
 will be restarted from the beginning (all initializers will be rerun).
 ### API Behavior
 All APIs that access containers by name should operate on both init and
 app containers. Because names are unique the addition of the init container
 should be transparent to use cases.
 A client with no knowledge of init containers should see appropriate
 container status `reason` and `message` fields while the pod is in the
 `Pending` phase, and so be able to communicate that to end users.
 ### Example init containers
 * Wait for a service to be created
        pod:
          spec:
            initContainers:
            - name: wait
              image: centos:centos7
              command: ["/bin/sh", "-c", "for i in {1..100}; do sleep 1; if dig myservice; then exit 0; fi; exit 1"]
            containers:
            - name: run
              image: application-image
              command: ["/my_application_that_depends_on_myservice"]
 * Register this pod with a remote server
        pod:
          spec:
            initContainers:
            - name: register
              image: centos:centos7
              command: ["/bin/sh", "-c", "curl -X POST http://$MANAGEMENT_SERVICE_HOST:$MANAGEMENT_SERVICE_PORT/register -d 'instance=$(POD_NAME)&ip=$(POD_IP)'"]
              env:
              - name: POD_NAME
                valueFrom:
                  field: metadata.name
              - name: POD_IP
                valueFrom:
                  field: status.podIP
            containers:
            - name: run
              image: application-image
              command: ["/my_application_that_depends_on_myservice"]
 * Wait for an arbitrary period of time
        pod:
          spec:
            initContainers:
            - name: wait
              image: centos:centos7
              command: ["/bin/sh", "-c", "sleep 60"]
            containers:
            - name: run
              image: application-image
              command: ["/static_binary_without_sleep"]
 * Clone a git repository into a volume (can be implemented by volume containers in the future):
        pod:
          spec:
            initContainers:
            - name: download
              image: image-with-git
              command: ["git", "clone", "https://github.com/myrepo/myrepo.git", "/var/lib/data"]
              volumeMounts:
              - mountPath: /var/lib/data
                volumeName: git
            containers:
            - name: run
              image: centos:centos7
              command: ["/var/lib/data/binary"]
              volumeMounts:
              - mountPath: /var/lib/data
                volumeName: git
            volumes:
            - emptyDir: {}
              name: git
 * Execute a template transformation based on environment (can be implemented by volume containers in the future):
        pod:
          spec:
            initContainers:
            - name: copy
              image: application-image
              command: ["/bin/cp", "mytemplate.j2", "/var/lib/data/"]
              volumeMounts:
              - mountPath: /var/lib/data
                volumeName: data
            - name: transform
              image: image-with-jinja
              command: ["/bin/sh", "-c", "jinja /var/lib/data/mytemplate.j2 > /var/lib/data/mytemplate.conf"]
              volumeMounts:
              - mountPath: /var/lib/data
                volumeName: data
            containers:
            - name: run
              image: application-image
              command: ["/myapplication", "-conf", "/var/lib/data/mytemplate.conf"]
              volumeMounts:
              - mountPath: /var/lib/data
                volumeName: data
            volumes:
            - emptyDir: {}
              name: data
 * Perform a container build
        pod:
          spec:
            initContainers:
            - name: copy
              image: base-image
              workingDir: /home/user/source-tree
              command: ["make"]
            containers:
            - name: commit
              image: image-with-docker
              command:
              - /bin/sh
              - -c
              - docker commit $(complex_bash_to_get_container_id_of_copy) \
                docker push $(commit_id) myrepo:latest
              volumesMounts:
              - mountPath: /var/run/docker.sock
                volumeName: dockersocket
 ## Backwards compatibilty implications
 Since this is a net new feature in the API and Kubelet, new API servers during upgrade may not
 be able to rely on Kubelets implementing init containers. The management of feature skew between
 master and Kubelet is tracked in issue [#4855](https://github.com/kubernetes/kubernetes/issues/4855).
 ## Future work
 * Unify pod QoS class with init containers
 * Implement container / image volumes to make composition of runtime from images efficient
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/container-init.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/container-runtime-interface-v1.md
+++ b/docs/proposals/container-runtime-interface-v1.md
@ -1,267 +1 @@
-# Redefine Container Runtime Interface
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/container-runtime-interface-v1.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/container-runtime-interface-v1.md)
 The umbrella issue: [#22964](https://issues.k8s.io/22964)
 ## Motivation
 Kubelet employs a declarative pod-level interface, which acts as the sole
 integration point for container runtimes (e.g., `docker` and `rkt`). The
 high-level, declarative interface has caused higher integration and maintenance
 cost, and also slowed down feature velocity for the following reasons.
  1. **Not every container runtime supports the concept of pods natively**.
     When integrating with Kubernetes, a significant amount of work needs to
     go into implementing a shim of significant size to support all pod
     features. This also adds maintenance overhead (e.g., `docker`).
  2. **High-level interface discourages code sharing and reuse among runtimes**.
     E.g, each runtime today implements an all-encompassing `SyncPod()`
     function, with the Pod Spec as the input argument. The runtime implements
     logic to determine how to achieve the desired state based on the current
     status, (re-)starts pods/containers and manages lifecycle hooks
     accordingly.
  3. **Pod Spec is evolving rapidly**. New features are being added constantly.
     Any pod-level change or addition requires changing of all container
     runtime shims. E.g., init containers and volume containers.
 ## Goals and Non-Goals
 The goals of defining the interface are to
 - **improve extensibility**: Easier container runtime integration.
 - **improve feature velocity**
 - **improve code maintainability**
 The non-goals include
 - proposing *how* to integrate with new runtimes, i.e., where the shim
   resides. The discussion of adopting a client-server architecture is tracked
   by [#13768](https://issues.k8s.io/13768), where benefits and shortcomings of
   such an architecture is discussed.
 - versioning the new interface/API. We intend to provide API versioning to
   offer stability for runtime integrations, but the details are beyond the
   scope of this proposal.
 - adding support to Windows containers. Windows container support is a
   parallel effort and is tracked by [#22623](https://issues.k8s.io/22623).
   The new interface will not be augmented to support Windows containers, but
   it will be made extensible such that the support can be added in the future.
 - re-defining Kubelet's internal interfaces. These interfaces, though, may
   affect Kubelet's maintainability, is not relevant to runtime integration.
 - improving Kubelet's efficiency or performance, e.g., adopting event stream
   from the container runtime [#8756](https://issues.k8s.io/8756),
   [#16831](https://issues.k8s.io/16831).
 ## Requirements
 * Support the already integrated container runtime: `docker` and `rkt`
 * Support hypervisor-based container runtimes: `hyper`.
 The existing pod-level interface will remain as it is in the near future to
 ensure supports of all existing runtimes are continued. Meanwhile, we will
 work with all parties involved to switching to the proposed interface.
 ## Container Runtime Interface
 The main idea of this proposal is to adopt an imperative container-level
 interface, which allows Kubelet to directly control the lifecycles of the
 containers.
 Pod is composed of a group of containers in an isolated environment with
 resource constraints. In Kubernetes, pod is also the smallest schedulable unit.
 After a pod has been scheduled to the node, Kubelet will create the environment
 for the pod, and add/update/remove containers in that environment to meet the
 Pod Spec. To distinguish between the environment and the pod as a whole, we
 will call the pod environment **PodSandbox.**
 The container runtimes may interpret the PodSandBox concept differently based
 on how it operates internally. For runtimes relying on hypervisor, sandbox
 represents a virtual machine naturally. For others, it can be Linux namespaces.
 In short, a PodSandbox should have the following features.
 * **Isolation**: E.g., Linux namespaces or a full virtual machine, or even
   support additional security features.
 * **Compute resource specifications**: A PodSandbox should implement pod-level
   resource demands and restrictions.
 *NOTE: The resource specification does not include externalized costs to
 container setup that are not currently trackable as Pod constraints, e.g.,
 filesystem setup, container image pulling, etc.*
 A container in a PodSandbox maps to an application in the Pod Spec. For Linux
 containers, they are expected to share at least network and IPC namespaces,
 with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615).
 Below is an example of the proposed interfaces.
 ```go
 // PodSandboxManager contains basic operations for sandbox.
 type PodSandboxManager interface {
    Create(config *PodSandboxConfig) (string, error)
    Delete(id string) (string, error)
    List(filter PodSandboxFilter) []PodSandboxListItem
    Status(id string) PodSandboxStatus
 }
 // ContainerRuntime contains basic operations for containers.
 type ContainerRuntime interface {
    Create(config *ContainerConfig, sandboxConfig *PodSandboxConfig, PodSandboxID string) (string, error)
    Start(id string) error
    Stop(id string, timeout int) error
    Remove(id string) error
    List(filter ContainerFilter) ([]ContainerListItem, error)
    Status(id string) (ContainerStatus, error)
    Exec(id string, cmd []string, streamOpts StreamOptions) error
 }
 // ImageService contains image-related operations.
 type ImageService interface {
    List() ([]Image, error)
    Pull(image ImageSpec, auth AuthConfig) error
    Remove(image ImageSpec) error
    Status(image ImageSpec) (Image, error)
    Metrics(image ImageSpec) (ImageMetrics, error)
 }
 type ContainerMetricsGetter interface {
    ContainerMetrics(id string) (ContainerMetrics, error)
 }
 All functions listed above are expected to be thread-safe.
 ```
 ### Pod/Container Lifecycle
 The PodSandbox’s lifecycle is decoupled from the containers, i.e., a sandbox
 is created before any containers, and can exist after all containers in it have
 terminated.
 Assume there is a pod with a single container C. To start a pod:
 ```
  create sandbox Foo --> create container C --> start container C
 ```
 To delete a pod:
 ```
  stop container C --> remove container C --> delete sandbox Foo
 ```
 The container runtime must not apply any transition (such as starting a new
 container) unless explicitly instructed by Kubelet. It is Kubelet's
 responsibility to enforce garbage collection, restart policy, and otherwise
 react to changes in lifecycle.
 The only transitions that are possible for a container are described below:
 ```
 () -> Created        // A container can only transition to created from the
                     // empty, nonexistent state. The ContainerRuntime.Create
                     // method causes this transition.
 Created -> Running   // The ContainerRuntime.Start method may be applied to a
                     // Created container to move it to Running
 Running -> Exited    // The ContainerRuntime.Stop method may be applied to a running 
                     // container to move it to Exited.
                     // A container may also make this transition under its own volition 
 Exited -> ()         // An exited container can be moved to the terminal empty
                     // state via a ContainerRuntime.Remove call.
 ```
 Kubelet is also responsible for gracefully terminating all the containers
 in the sandbox before deleting the sandbox. If Kubelet chooses to delete
 the sandbox with running containers in it, those containers should be forcibly
 deleted.
 Note that every PodSandbox/container lifecycle operation (create, start,
 stop, delete) should either return an error or block until the operation
 succeeds. A successful operation should include a state transition of the
 PodSandbox/container. E.g., if a `Create` call for a container does not
 return an error, the container state should be "created" when the runtime is
 queried.
 ### Updates to PodSandbox or Containers
 Kubernetes support updates only to a very limited set of fields in the Pod
 Spec.  These updates may require containers to be re-created by Kubelet. This
 can be achieved through the proposed, imperative container-level interface.
 On the other hand, PodSandbox update currently is not required.
 ### Container Lifecycle Hooks
 Kubernetes supports post-start and pre-stop lifecycle hooks, with ongoing
 discussion for supporting pre-start and post-stop hooks in
 [#140](https://issues.k8s.io/140).
 These lifecycle hooks will be implemented by Kubelet via `Exec` calls to the
 container runtime. This frees the runtimes from having to support hooks
 natively.
 Illustration of the container lifecycle and hooks:
 ```
            pre-start post-start    pre-stop post-stop
               |        |              |       |
              exec     exec           exec    exec
               |        |              |       |
 create --------> start ----------------> stop --------> remove
 ```
 In order for the lifecycle hooks to function as expected, the `Exec` call
 will need access to the container's filesystem (e.g., mount namespaces).
 ### Extensibility
 There are several dimensions for container runtime extensibility.
 - Host OS (e.g., Linux)
 - PodSandbox isolation mechanism (e.g., namespaces or VM)
 - PodSandbox OS (e.g., Linux)
 As mentioned previously, this proposal will only address the Linux based
 PodSandbox and containers. All Linux-specific configuration will be grouped
 into one field. A container runtime is required to enforce all configuration
 applicable to its platform, and should return an error otherwise.
 ### Keep it minimal
 The proposed interface is experimental, i.e., it will go through (many) changes
 until it stabilizes. The principle is to to keep the interface minimal and
 extend it later if needed. This includes a several features that are still in
 discussion and may be achieved alternatively:
 * `AttachContainer`: [#23335](https://issues.k8s.io/23335)
 * `PortForward`: [#25113](https://issues.k8s.io/25113)
 ## Alternatives
 **[Status quo] Declarative pod-level interface**
 - Pros: No changes needed.
 - Cons: All the issues stated in #motivation
 **Allow integration at both pod- and container-level interfaces**
 - Pros: Flexibility.
 - Cons: All the issues stated in #motivation
 **Imperative pod-level interface**
 The interface contains only CreatePod(), StartPod(), StopPod() and RemovePod().
 This implies that the runtime needs to take over container lifecycle
 management (i.e., enforce restart policy), lifecycle hooks, liveness checks,
 etc. Kubelet will mainly be responsible for interfacing with the apiserver, and
 can potentially become a very thin daemon.
 - Pros: Lower maintenance overhead for the Kubernetes maintainers if `Docker`
   shim maintenance cost is discounted.
 - Cons: This will incur higher integration cost because every new container
   runtime needs to implement all the features and need to understand the
   concept of pods. This would also lead to lower feature velocity because the
   interface will need to be changed, and the new pod-level feature will need
   to be supported in each runtime.
 ## Related Issues
 * Metrics: [#27097](https://issues.k8s.io/27097)
 * Log management: [#24677](https://issues.k8s.io/24677)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/container-runtime-interface-v1.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/controller-ref.md
+++ b/docs/proposals/controller-ref.md
@ -1,102 +1 @@
-# ControllerRef proposal
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md)
 Author: gmarek@
 Last edit: 2016-05-11
 Status: raw
 Approvers:
 - [ ] briangrant
 - [ ] dbsmith
 **Table of Contents**
 - [Goal of ControllerReference](#goal-of-setreference)
 - [Non goals](#non-goals)
 - [API and semantic changes](#api-and-semantic-changes)
 - [Upgrade/downgrade procedure](#upgradedowngrade-procedure)
 - [Orphaning/adoption](#orphaningadoption)
 - [Implementation plan (sketch)](#implementation-plan-sketch)
 - [Considered alternatives](#considered-alternatives)
 # Goal of ControllerReference
 Main goal of `ControllerReference` effort is to solve a problem of overlapping controllers that fight over some resources (e.g. `ReplicaSets` fighting with `ReplicationControllers` over `Pods`), which cause serious [problems](https://github.com/kubernetes/kubernetes/issues/24433) such as exploding memory of Controller Manager.
 We don’t want to have (just) an in-memory solution, as we don’t want a Controller Manager crash to cause massive changes in object ownership in the system. I.e. we need to persist the information about "owning controller".
 Secondary goal of this effort is to improve performance of various controllers and schedulers, by removing the need for expensive lookup for all matching "controllers".
 # Non goals
 Cascading deletion is not a goal of this effort. Cascading deletion will use `ownerReferences`, which is a [separate effort](garbage-collection.md).
 `ControllerRef` will extend `OwnerReference` and reuse machinery written for it (GarbageCollector, adoption/orphaning logic).
 # API and semantic changes
 There will be a new API field in the `OwnerReference` in which we will store an information if given owner is a managing controller:
 ```
 OwnerReference {
    …
    Controller bool
    …
 }
 ```
 From now on by `ControllerRef` we mean an `OwnerReference` with `Controller=true`.
 Most controllers (all that manage collections of things defined by label selector) will have slightly changed semantics: currently controller owns an object if its selector matches object’s labels and if it doesn't notice an older controller of the same kind that also matches the object's labels, but after introduction of `ControllerReference` a controller will own an object iff selector matches labels and the `OwnerReference` with `Controller=true`points to it.
 If the owner's selector or owned object's labels change, the owning controller will be responsible for orphaning (clearing `Controller` field in the `OwnerReference` and/or deleting `OwnerReference` altogether) objects, after which adoption procedure (setting `Controller` field in one of `OwnerReferencec` and/or adding new `OwnerReferences`) might occur, if another controller has a selector matching.
 For debugging purposes we want to add an `adoptionTime` annotation prefixed with `kubernetes.io/` which will keep the time of last controller ownership transfer.
 # Upgrade/downgrade procedure
 Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same upgrade/downgrade procedures.
 # Orphaning/adoption
 Because `ControllerRef` will be a part of `OwnerReference` effort it will have the same orphaning/adoption procedures.
 Controllers will orphan objects they own in two cases:
 * Change of label/selector causing selector to stop matching labels (executed by the controller)
 * Deletion of a controller with `Orphaning=true` (executed by the GarbageCollector)
 We will need a secondary orphaning mechanism in case of unclean controller deletion:
 * GarbageCollector will remove `ControllerRef` from objects that no longer points to existing controllers
 Controller will adopt (set `Controller` field in the `OwnerReference` that points to it) an object whose labels match its selector iff:
 * there are no `OwnerReferences` with `Controller` set to true in `OwnerReferences` array
 * `DeletionTimestamp` is not set
 and
 * Controller is the first controller that will manage to adopt the Pod from all Controllers that have matching label selector and don't have `DeletionTimestamp` set.
 By design there are possible races during adoption if multiple controllers can own a given object.
 To prevent re-adoption of an object during deletion the `DeletionTimestamp` will be set when deletion is starting. When a controller has a non-nil `DeletionTimestamp` it won’t take any actions except updating its `Status` (in particular it won’t adopt any objects).
 # Implementation plan (sketch):
 * Add API field for `Controller`,
 * Extend `OwnerReference` adoption procedure to set a `Controller` field in one of the owners,
 * Update all affected controllers to respect `ControllerRef`.
 Necessary related work:
 * `OwnerReferences` are correctly added/deleted,
 * GarbageCollector removes dangling references,
 * Controllers don't take any meaningful actions when `DeletionTimestamps` is set.
 # Considered alternatives
 * Generic "ReferenceController": centralized component that managed adoption/orphaning
    * Dropped because: hard to write something that will work for all imaginable 3rd party objects, adding hooks to framework makes it possible for users to write their own logic
 * Separate API field for `ControllerRef` in the ObjectMeta.
    * Dropped because: nontrivial relationship between `ControllerRef` and `OwnerReferences` when it comes to deletion/adoption.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/controller-ref.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/deploy.md
+++ b/docs/proposals/deploy.md
@ -1,147 +1 @@
-<!-- BEGIN MUNGE: GENERATED_TOC -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/deploy.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/deploy.md)
 - [Deploy through CLI](#deploy-through-cli)
  - [Motivation](#motivation)
  - [Requirements](#requirements)
  - [Related `kubectl` Commands](#related-kubectl-commands)
    - [`kubectl run`](#kubectl-run)
    - [`kubectl scale` and `kubectl autoscale`](#kubectl-scale-and-kubectl-autoscale)
    - [`kubectl rollout`](#kubectl-rollout)
    - [`kubectl set`](#kubectl-set)
    - [Mutating Operations](#mutating-operations)
    - [Example](#example)
  - [Support in Deployment](#support-in-deployment)
    - [Deployment Status](#deployment-status)
    - [Deployment Version](#deployment-version)
    - [Pause Deployments](#pause-deployments)
    - [Perm-failed Deployments](#perm-failed-deployments)
 <!-- END MUNGE: GENERATED_TOC -->
 # Deploy through CLI
 ## Motivation
 Users can use [Deployments](../user-guide/deployments.md) or [`kubectl rolling-update`](../user-guide/kubectl/kubectl_rolling-update.md) to deploy in their Kubernetes clusters. A Deployment provides declarative update for Pods and ReplicationControllers, whereas `rolling-update` allows the users to update their earlier deployment without worrying about schemas and configurations. Users need a way that's similar to `rolling-update` to manage their Deployments more easily.
 `rolling-update` expects ReplicationController as the only resource type it deals with. It's not trivial to support exactly the same behavior with Deployment, which requires:
 - Print out scaling up/down events.
 - Stop the deployment if users press Ctrl-c.
 - The controller should not make any more changes once the process ends. (Delete the deployment when status.replicas=status.updatedReplicas=spec.replicas)
 So, instead, this document proposes another way to support easier deployment management via Kubernetes CLI (`kubectl`).
 ## Requirements
 The followings are operations we need to support for the users to easily managing deployments:
 - **Create**: To create deployments.
 - **Rollback**: To restore to an earlier version of deployment.
 - **Watch the status**: To watch for the status update of deployments.
 - **Pause/resume**: To pause a deployment mid-way, and to resume it. (A use case is to support canary deployment.)
 - **Version information**: To record and show version information that's meaningful to users. This can be useful for rollback.
 ## Related `kubectl` Commands
 ### `kubectl run`
 `kubectl run` should support the creation of Deployment (already implemented) and DaemonSet resources.
 ### `kubectl scale` and `kubectl autoscale`
 Users may use `kubectl scale` or `kubectl autoscale` to scale up and down Deployments (both already implemented).
 ### `kubectl rollout`
 `kubectl rollout` supports both Deployment and DaemonSet. It has the following subcommands:
 - `kubectl rollout undo` works like rollback; it allows the users to rollback to a previous version of deployment.
 - `kubectl rollout pause` allows the users to pause a deployment. See [pause deployments](#pause-deployments).
 - `kubectl rollout resume` allows the users to resume a paused deployment.
 - `kubectl rollout status` shows the status of a deployment.
 - `kubectl rollout history` shows meaningful version information of all previous deployments. See [development version](#deployment-version).
 - `kubectl rollout retry` retries a failed deployment. See [perm-failed deployments](#perm-failed-deployments).
 ### `kubectl set`
 `kubectl set` has the following subcommands:
 - `kubectl set env` allows the users to set environment variables of Kubernetes resources. It should support any object that contains a single, primary PodTemplate (such as Pod, ReplicationController, ReplicaSet, Deployment, and DaemonSet).
 - `kubectl set image` allows the users to update multiple images of Kubernetes resources. Users will use `--container` and `--image` flags to update the image of a container. It should support anything that has a PodTemplate.
 `kubectl set` should be used for things that are common and commonly modified. Other possible future commands include:
 - `kubectl set volume`
 - `kubectl set limits`
 - `kubectl set security`
 - `kubectl set port`
 ### Mutating Operations
 Other means of mutating Deployments and DaemonSets, including `kubectl apply`, `kubectl edit`, `kubectl replace`, `kubectl patch`, `kubectl label`, and `kubectl annotate`, may trigger rollouts if they modify the pod template.
 `kubectl create` and `kubectl delete`, for creating and deleting Deployments and DaemonSets, are also relevant.
 ### Example
 With the commands introduced above, here's an example of deployment management:
 ```console
 # Create a Deployment
 $ kubectl run nginx --image=nginx --replicas=2 --generator=deployment/v1beta1
 # Watch the Deployment status
 $ kubectl rollout status deployment/nginx
 # Update the Deployment 
 $ kubectl set image deployment/nginx --container=nginx --image=nginx:<some-version>
 # Pause the Deployment
 $ kubectl rollout pause deployment/nginx
 # Resume the Deployment
 $ kubectl rollout resume deployment/nginx
 # Check the change history (deployment versions)
 $ kubectl rollout history deployment/nginx
 # Rollback to a previous version.
 $ kubectl rollout undo deployment/nginx --to-version=<version>
 ```
 ## Support in Deployment
 ### Deployment Status
 Deployment status should summarize information about Pods, which includes:
 - The number of pods of each version.
 - The number of ready/not ready pods.
 See issue [#17164](https://github.com/kubernetes/kubernetes/issues/17164).
 ### Deployment Version
 We store previous deployment version information in annotations `rollout.kubectl.kubernetes.io/change-source` and `rollout.kubectl.kubernetes.io/version` of replication controllers of the deployment, to support rolling back changes as well as for the users to view previous changes with `kubectl rollout history`.
 - `rollout.kubectl.kubernetes.io/change-source`, which is optional, records the kubectl command of the last mutation made to this rollout. Users may use `--record` in `kubectl` to record current command in this annotation.
 - `rollout.kubectl.kubernetes.io/version` records a version number to distinguish the change sequence of a deployment's
 replication controllers. A deployment obtains the largest version number from its replication controllers and increments the number by 1 upon update or creation of the deployment, and update the version annotation of its new replication controller.
 When the users perform a rollback, i.e. `kubectl rollout undo`, the deployment first looks at its existing replication controllers, regardless of their number of replicas. Then it finds the one with annotation `rollout.kubectl.kubernetes.io/version` that either contains the specified rollback version number or contains the second largest version number among all the replication controllers (current new replication controller should obtain the largest version number) if the user didn't specify any version number (the user wants to rollback to the last change). Lastly, it
 starts scaling up that replication controller it's rolling back to, and scaling down the current ones, and then update the version counter and the rollout annotations accordingly.
 Note that a deployment's replication controllers use PodTemplate hashes (i.e. the hash of `.spec.template`) to distinguish from each others. When doing rollout or rollback, a deployment reuses existing replication controller if it has the same PodTemplate, and its `rollout.kubectl.kubernetes.io/change-source` and `rollout.kubectl.kubernetes.io/version` annotations will be updated by the new rollout. At this point, the earlier state of this replication controller is lost in history. For example, if we had 3 replication controllers in
 deployment history, and then we do a rollout with the same PodTemplate as version 1, then version 1 is lost and becomes version 4 after the rollout.
 To make deployment versions more meaningful and readable for the users, we can add more annotations in the future. For example, we can add the following flags to `kubectl` for the users to describe and record their current rollout:
 - `--description`: adds `description` annotation to an object when it's created to describe the object.
 - `--note`: adds `note` annotation to an object when it's updated to record the change.
 - `--commit`: adds `commit` annotation to an object with the commit id.
 ### Pause Deployments
 Users sometimes need to temporarily disable a deployment. See issue [#14516](https://github.com/kubernetes/kubernetes/issues/14516).
 ### Perm-failed Deployments
 The deployment could be marked as "permanently failed" for a given spec hash so that the system won't continue thrashing on a doomed deployment. The users can retry a failed deployment with `kubectl rollout retry`. See issue [#14519](https://github.com/kubernetes/kubernetes/issues/14519).
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/deploy.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/deployment.md
+++ b/docs/proposals/deployment.md
@ -1,229 +1 @@
-# Deployment
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/deployment.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/deployment.md)
 ## Abstract
 A proposal for implementing a new resource - Deployment - which will enable
 declarative config updates for Pods and ReplicationControllers.
 Users will be able to create a Deployment, which will spin up
 a ReplicationController to bring up the desired pods.
 Users can also target the Deployment at existing ReplicationControllers, in
 which case the new RC will replace the existing ones. The exact mechanics of
 replacement depends on the DeploymentStrategy chosen by the user.
 DeploymentStrategies are explained in detail in a later section.
 ## Implementation
 ### API Object
 The `Deployment` API object will have the following structure:
 ```go
 type Deployment struct {
  TypeMeta
  ObjectMeta
  // Specification of the desired behavior of the Deployment.
  Spec DeploymentSpec
  // Most recently observed status of the Deployment.
  Status DeploymentStatus
 }
 type DeploymentSpec struct {
  // Number of desired pods. This is a pointer to distinguish between explicit
  // zero and not specified. Defaults to 1.
  Replicas *int
  // Label selector for pods. Existing ReplicationControllers whose pods are
  // selected by this will be scaled down. New ReplicationControllers will be
  // created with this selector, with a unique label `pod-template-hash`.
  // If Selector is empty, it is defaulted to the labels present on the Pod template.
  Selector map[string]string
  // Describes the pods that will be created.
  Template *PodTemplateSpec
  // The deployment strategy to use to replace existing pods with new ones.
  Strategy DeploymentStrategy
 }
 type DeploymentStrategy struct {
  // Type of deployment. Can be "Recreate" or "RollingUpdate".
  Type DeploymentStrategyType
  // TODO: Update this to follow our convention for oneOf, whatever we decide it
  // to be.
  // Rolling update config params. Present only if DeploymentStrategyType =
  // RollingUpdate.
  RollingUpdate *RollingUpdateDeploymentStrategy
 }
 type DeploymentStrategyType string
 const (
  // Kill all existing pods before creating new ones.
  RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate"
  // Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one.
  RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate"
 )
 // Spec to control the desired behavior of rolling update.
 type RollingUpdateDeploymentStrategy struct {
  // The maximum number of pods that can be unavailable during the update.
  // Value can be an absolute number (ex: 5) or a percentage of total pods at the start of update (ex: 10%).
  // Absolute number is calculated from percentage by rounding up.
  // This can not be 0 if MaxSurge is 0.
  // By default, a fixed value of 1 is used.
  // Example: when this is set to 30%, the old RC can be scaled down by 30%
  // immediately when the rolling update starts. Once new pods are ready, old RC
  // can be scaled down further, followed by scaling up the new RC, ensuring
  // that at least 70% of original number of pods are available at all times
  // during the update.
  MaxUnavailable IntOrString
  // The maximum number of pods that can be scheduled above the original number of
  // pods.
  // Value can be an absolute number (ex: 5) or a percentage of total pods at
  // the start of the update (ex: 10%). This can not be 0 if MaxUnavailable is 0.
  // Absolute number is calculated from percentage by rounding up.
  // By default, a value of 1 is used.
  // Example: when this is set to 30%, the new RC can be scaled up by 30%
  // immediately when the rolling update starts. Once old pods have been killed,
  // new RC can be scaled up further, ensuring that total number of pods running
  // at any time during the update is atmost 130% of original pods.
  MaxSurge IntOrString
  // Minimum number of seconds for which a newly created pod should be ready
  // without any of its container crashing, for it to be considered available.
  // Defaults to 0 (pod will be considered available as soon as it is ready)
  MinReadySeconds int
 }
 type DeploymentStatus struct {
  // Total number of ready pods targeted by this deployment (this
  // includes both the old and new pods).
  Replicas int
  // Total number of new ready pods with the desired template spec.
  UpdatedReplicas int
 }
 ```
 ### Controller
 #### Deployment Controller
 The DeploymentController will make Deployments happen.
 It will watch Deployment objects in etcd.
 For each pending deployment, it will:
 1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector.
   - For now, we will do this in the client - list all RCs and then filter the
     ones we want. Eventually, we want to expose this in the API.
 2. The new RC can have the same selector as the old RC and hence we add a unique
   selector to all these RCs (and the corresponding label to their pods) to ensure
   that they do not select the newly created pods (or old pods get selected by
   new RC).
   - The label key will be "pod-template-hash".
   - The label value will be hash of the podTemplateSpec for that RC without
     this label. This value will be unique for all RCs, since PodTemplateSpec should be unique.
   - If the RCs and pods dont already have this label and selector:
     - We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to
       ensure that all new pods that they create will have this label.
     - Then we will add this label to their existing pods and then add this as a selector
       to that RC.
 3. Find if there exists an RC for which value of "pod-template-hash" label
   is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then
   this is the RC that will be ramped up. If there is no such RC, then we create
   a new one using DeploymentSpec and then add a "pod-template-hash" label
   to it. RCSpec.replicas = 0 for a newly created RC.
 4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy.
   - Raise an event if we detect an error, like new pods failing to come up.
 5. Go back to step 1 unless the new RC has been ramped up to desired replicas
   and the old RCs have been ramped down to 0.
 6. Cleanup.
 DeploymentController is stateless so that it can recover in case it crashes during a deployment.
 ### MinReadySeconds
 We will implement MinReadySeconds using the Ready condition in Pod. We will add
 a LastTransitionTime to PodCondition and update kubelet to set Ready to false,
 each time any container crashes. Kubelet will set Ready condition back to true once
 all containers are ready. For containers without a readiness probe, we will
 assume that they are ready as soon as they are up.
 https://github.com/kubernetes/kubernetes/issues/11234 tracks updating kubelet
 and https://github.com/kubernetes/kubernetes/issues/12615 tracks adding
 LastTransitionTime to PodCondition.
 ## Changing Deployment mid-way
 ### Updating
 Users can update an ongoing deployment before it is completed.
 In this case, the existing deployment will be stalled and the new one will
 begin.
 For ex: consider the following case:
 - User creates a deployment to rolling-update 10 pods with image:v1 to
  pods with image:v2.
 - User then updates this deployment to create pods with image:v3,
  when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC
  had been ramped down to 5 pods.
 - When Deployment Controller observes the new deployment, it will create
  a new RC for creating pods with image:v3. It will then start ramping up this
  new RC to 10 pods and will ramp down both the existing RCs to 0.
 ### Deleting
 Users can pause/cancel a deployment by deleting it before it is completed.
 Recreating the same deployment will resume it.
 For ex: consider the following case:
 - User creates a deployment to rolling-update 10 pods with image:v1 to
  pods with image:v2.
 - User then deletes this deployment while the old and new RCs are at 5 replicas each.
  User will end up with 2 RCs with 5 replicas each.
 User can then create the same deployment again in which case, DeploymentController will
 notice that the second RC exists already which it can ramp up while ramping down
 the first one.
 ### Rollback
 We want to allow the user to rollback a deployment. To rollback a
 completed (or ongoing) deployment, user can create (or update) a deployment with
 DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec.
 ## Deployment Strategies
 DeploymentStrategy specifies how the new RC should replace existing RCs.
 To begin with, we will support 2 types of deployment:
 * Recreate: We kill all existing RCs and then bring up the new one. This results
  in quick deployment but there is a downtime when old pods are down but
  the new ones have not come up yet.
 * Rolling update: We gradually scale down old RCs while scaling up the new one.
  This results in a slower deployment, but there is no downtime. At all times
  during the deployment, there are a few pods available (old or new). The number
  of available pods and when is a pod considered "available" can be configured
  using RollingUpdateDeploymentStrategy.
 In future, we want to support more deployment types.
 ## Future
 Apart from the above, we want to add support for the following:
 * Running the deployment process in a pod: In future, we can run the deployment process in a pod. Then users can define their own custom deployments and we can run it using the image name.
 * More DeploymentStrategyTypes: https://github.com/openshift/origin/blob/master/examples/deployment/README.md#deployment-types lists most commonly used ones.
 * Triggers: Deployment will have a trigger field to identify what triggered the deployment. Options are: Manual/UserTriggered, Autoscaler, NewImage.
 * Automatic rollback on error: We want to support automatic rollback on error or timeout.
 ## References
 - https://github.com/kubernetes/kubernetes/issues/1743 has most of the
  discussion that resulted in this proposal.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/deployment.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/disk-accounting.md
+++ b/docs/proposals/disk-accounting.md
@ -1,615 +1 @@
-**Author**: Vishnu Kannan
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/disk-accounting.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/disk-accounting.md)
 **Last** **Updated**: 11/16/2015
 **Status**: Pending Review
 This proposal is an attempt to come up with a means for accounting disk usage in Kubernetes clusters that are running docker as the container runtime. Some of the principles here might apply for other runtimes too.
 ### Why is disk accounting necessary?
 As of kubernetes v1.1 clusters become unusable over time due to the local disk becoming full. The kubelets on the node attempt to perform garbage collection of old containers and images, but that doesn’t prevent running pods from using up all the available disk space.
 Kubernetes users have no insight into how the disk is being consumed.
 Large images and rapid logging can lead to temporary downtime on the nodes. The node has to free up disk space by deleting images and containers. During this cleanup, existing pods can fail and new pods cannot be started. The node will also transition into an `OutOfDisk` condition, preventing more pods from being scheduled to the node.
 Automated eviction of pods that are hogging the local disk is not possible since proper accounting isn’t available.
 Since local disk is a non-compressible resource, users need means to restrict usage of local disk by pods and containers. Proper disk accounting is a prerequisite. As of today, a misconfigured low QoS class pod can end up bringing down the entire cluster by taking up all the available disk space (misconfigured logging for example)
 ### Goals
 1. Account for disk usage on the nodes.
 2. Compatibility with the most common docker storage backends - devicemapper, aufs and overlayfs
 3. Provide a roadmap for enabling disk as a schedulable resource in the future.
 4. Provide a plugin interface for extending support to non-default filesystems and storage drivers.
 ### Non Goals
 1. Compatibility with all storage backends. The matrix is pretty large already and the priority is to get disk accounting to on most widely deployed platforms.
 2. Support for filesystems other than ext4 and xfs.
 ### Introduction
 Disk accounting in Kubernetes cluster running with docker is complex because of the plethora of ways in which disk gets utilized by a container.
 Disk can be consumed for:
 1. Container images
 2. Container’s writable layer
 3. Container’s logs - when written to stdout/stderr and default logging backend in docker is used.
 4. Local volumes - hostPath, emptyDir, gitRepo, etc.
 As of Kubernetes v1.1, kubelet exposes disk usage for the entire node and the container’s writable layer for aufs docker storage driver.
 This information is made available to end users via the heapster monitoring pipeline.
 #### Image layers
 Image layers are shared between containers (COW) and so accounting for images is complicated.
 Image layers will have to be accounted as system overhead.
 As of today, it is not possible to check if there is enough disk space available on the node before an image is pulled.
 #### Writable Layer
 Docker creates a writable layer for every container on the host. Depending on the storage driver, the location and the underlying filesystem of this layer will change.
 Any files that the container creates or updates (assuming there are no volumes) will be considered as writable layer usage.
 The underlying filesystem is whatever the docker storage directory resides on. It is ext4 by default on most distributions, and xfs on RHEL.
 #### Container logs
 Docker engine provides a pluggable logging interface. Kubernetes is currently using the default logging mode which is `local file`. In this mode, the docker daemon stores bytes written by containers to their stdout or stderr, to local disk. These log files are contained in a special directory that is managed by the docker daemon. These logs are exposed via `docker logs` interface which is then exposed via kubelet and apiserver APIs. Currently, there is a hard-requirement for persisting these log files on the disk.
 #### Local Volumes
 Volumes are slightly different from other local disk use cases. They are pod scoped. Their lifetime is tied to that of a pod. Due to this property accounting of volumes will also be at the pod level.
 As of now, the volume types that can use local disk directly are ‘HostPath’, ‘EmptyDir’, and ‘GitRepo’. Secretes and Downwards API volumes wrap these primitive volumes.
 Everything else is a network based volume.
 ‘HostPath’ volumes map in existing directories in the host filesystem into a pod. Kubernetes manages only the mapping. It does not manage the source on the host filesystem.
 In addition to this, the changes introduced by a pod on the source of a hostPath volume is not cleaned by kubernetes once the pod exits. Due to these limitations, we will have to account hostPath volumes to system overhead. We should explicitly discourage use of HostPath in read-write mode.
 `EmptyDir`, `GitRepo` and other local storage volumes map to a directory on the host root filesystem, that is managed by Kubernetes (kubelet). Their contents are erased as soon as the pod exits. Tracking and potentially restricting usage for volumes is possible.
 ### Docker storage model
 Before we start exploring solutions, let’s get familiar with how docker handles storage for images, writable layer and logs.
 On all storage drivers, logs are stored under `<docker root dir>/containers/<container-id>/`
 The default location of the docker root directory is `/var/lib/docker`.
 Volumes are handled by kubernetes.
 *Caveat: Volumes specified as part of Docker images are not handled by Kubernetes currently.*
 Container images and writable layers are managed by docker and their location will change depending on the storage driver. Each image layer and writable layer is referred to by an ID. The image layers are read-only. Once saved, existing writable layers can be frozen. Saving feature is not of importance to kubernetes since it works only on immutable images.
 *Note: Image layer IDs can be obtained by running `docker history -q --no-trunc <imagename>`*
 ##### Aufs
 Image layers and writable layers are stored under `/var/lib/docker/aufs/diff/<id>`.
 The writable layers ID is equivalent to that of the container ID.
 ##### Devicemapper
 Each container and each image gets own block device. Since this driver works at the block level, it is not possible to access the layers directly without mounting them. Each container gets its own block device while running.
 ##### Overlayfs
 Image layers and writable layers are stored under `/var/lib/docker/overlay/<id>`.
 Identical files are hardlinked between images.
 The image layers contain all their data under a `root` subdirectory.
 Everything under  `/var/lib/docker/overlay/<id>` are files required for running the container, including its writable layer.
 ### Improve disk accounting
 Disk accounting is dependent on the storage driver in docker. A common solution that works across all storage drivers isn't available.
 I’m listing a few possible solutions for disk accounting below along with their limitations.
 We need a plugin model for disk accounting. Some storage drivers in docker will require special plugins.
 #### Container Images
 As of today, the partition that is holding docker images is flagged by cadvisor, and it uses filesystem stats to identify the overall disk usage of that partition.
 Isolated usage of just image layers is available today using `docker history <image name>`.
 But isolated usage isn't of much use because image layers are shared between containers and so it is not possible to charge a single pod for image disk usage.
 Continuing to use the entire partition availability for garbage collection purposes in kubelet, should not affect reliability.
 We might garbage collect more often.
 As long as we do not expose features that require persisting old containers, computing image layer usage wouldn’t be necessary.
 Main goals for images are
 1. Capturing total image disk usage
 2. Check if a new image will fit on disk.
 In case we choose to compute the size of image layers alone, the following are some of the ways to achieve that.
 *Note that some of the strategies mentioned below are applicable in general to other kinds of storage like volumes, etc.*
 ##### Docker History
 It is possible to run `docker history` and then create a graph of all images and corresponding image layers.
 This graph will let us figure out the disk usage of all the images.
 **Pros**
 * Compatible across storage drivers.
 **Cons**
 * Requires maintaining an internal representation of images.
 ##### Enhance docker
 Docker handles the upload and download of image layers. It can embed enough information about each layer. If docker is enhanced to expose this information, we can statically identify space about to be occupied by read-only image layers, even before the image layers are downloaded.
 A new [docker feature](https://github.com/docker/docker/pull/16450) (docker pull --dry-run) is pending review, which outputs the disk space that will be consumed by new images. Once this feature lands, we can perform feasibility checks and reject pods that will consume more disk space that what is current availability on the node.
 Another option is to expose disk usage of all images together as a first-class feature.
 **Pros**
 * Works across all storage drivers since docker abstracts the storage drivers.
 * Less code to maintain in kubelet.
 **Cons**
 * Not available today.
 * Requires serialized image pulls.
 * Metadata files are not tracked.
 ##### Overlayfs and Aufs
 ####### `du`
 We can list all the image layer specific directories, excluding container directories, and run `du` on each of those directories.
 **Pros**:
 * This is the least-intrusive approach.
 * It will work off the box without requiring any additional configuration.
 **Cons**:
 * `du` can consume a lot of cpu and memory. There have been several issues reported against the kubelet in the past that were related to `du`.
 * It is time consuming. Cannot be run frequently. Requires special handling to constrain resource usage - setting lower nice value or running in a sub-container.
 * Can block container deletion by keeping file descriptors open.
 ####### Linux gid based Disk Quota
 [Disk quota](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html) feature provided by the linux kernel can be used to track the usage of image layers. Ideally, we need `project` support for disk quota, which lets us track usage of directory hierarchies using `project ids`. Unfortunately, that feature is only available for zfs filesystems. Since most of our distributions use `ext4` by default, we will have to use either `uid` or `gid` based quota tracking.
 Both `uids` and `gids` are meant for security. Overloading that concept for disk tracking is painful and ugly. But, that is what we have today.
 Kubelet needs to define a gid for tracking image layers and make that gid or group the owner of `/var/lib/docker/[aufs | overlayfs]` recursively. Once this is done, the quota sub-system in the kernel will report the blocks being consumed by the storage driver on the underlying partition.
 Since this number also includes the container’s writable layer, we will have to somehow subtract that usage from the overall usage of the storage driver directory. Luckily, we can use the same mechanism for tracking container’s writable layer. Once we apply a different `gid` to the container’s writable layer, which is located under `/var/lib/docker/<storage_driver>/diff/<container_id>`, the quota subsystem will not include the container’s writable layer usage.
 Xfs on the other hand support project quota which lets us track disk usage of arbitrary directories using a project. Support for this feature in ext4 is being reviewed. So on xfs, we can use quota without having to clobber the writable layer's uid and gid.
 **Pros**:
 * Low overhead tracking provided by the kernel.
 **Cons**
 * Requires updates to default ownership on docker’s internal storage driver directories. We will have to deal with storage driver implementation details in any approach that is not docker native.
 * Requires additional node configuration - quota subsystem needs to be setup on the node. This can either be automated or made a requirement for the node.
 * Kubelet needs to perform gid management. A range of gids have to allocated to the kubelet for the purposes of quota management. This range must not be used for any other purposes out of band. Not required if project quota is available.
 * Breaks `docker save` semantics. Since kubernetes assumes immutable images, this is not a blocker. To support quota in docker, we will need user-namespaces along with custom gid mapping for each container. This feature does not exist today. This is not an issue with project quota.
 *Note: Refer to the [Appendix](#appendix) section more real examples on using quota with docker.*
 **Project Quota**
 Project Quota support for ext4 is currently being reviewed upstream. If that feature lands in upstream sometime soon, project IDs will be used to disk tracking instead of uids and gids.
 ##### Devicemapper
 Devicemapper storage driver will setup two volumes, metadata and data, that will be used to store image layers and container writable layer. The volumes can be real devices or loopback. A Pool device is created which uses the underlying volume for real storage.
 A new thinly-provisioned volume, based on the pool, will be created for running container’s.
 The kernel tracks the usage of the pool device at the block device layer. The usage here includes image layers and container’s writable layers.
 Since the kubelet has to track the writable layer usage anyways, we can subtract the aggregated root filesystem usage from the overall pool device usage to get the image layer’s disk usage.
 Linux quota and `du` will not work with device mapper.
 A docker dry run option (mentioned above) is another possibility.
 #### Container Writable Layer
 ###### Overlayfs / Aufs
 Docker creates a separate directory for the container’s writable layer which is then overlayed on top of read-only image layers.
 Both the previously mentioned options of `du` and `Linux Quota` will work for this case as well.
 Kubelet can use `du` to track usage and enforce `limits` once disk becomes a schedulable resource. As mentioned earlier `du` is resource intensive.
 To use Disk quota, kubelet will have to allocate a separate gid per container. Kubelet can reuse the same gid for multiple instances of the same container (restart scenario). As and when kubelet garbage collects dead containers, the usage of the container will drop.
 If local disk becomes a schedulable resource, `linux quota` can be used to impose `request` and `limits` on the container writable layer.
 `limits` can be enforced using hard limits. Enforcing `request` will be tricky. One option is to enforce `requests` only when the disk availability drops below a threshold (10%). Kubelet can at this point evict pods that are exceeding their requested space. Other options include using `soft limits` with grace periods, but this option is complex.
 ###### Devicemapper
 FIXME: How to calculate writable layer usage with devicemapper?
 To enforce `limits` the volume created for the container’s writable layer filesystem can be dynamically [resized](https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/), to not use more than `limit`. `request` will have to be enforced by the kubelet.
 #### Container logs
 Container logs are not storage driver specific. We can use either `du` or `quota` to track log usage per container. Log files are stored under `/var/lib/docker/containers/<container-id>`.
 In the case of quota, we can create a separate gid for tracking log usage. This will let users track log usage and writable layer’s usage individually.
 For the purposes of enforcing limits though, kubelet will use the sum of logs and writable layer.
 In the future, we can consider adding log rotation support for these log files either in kubelet or via docker.
 #### Volumes
 The local disk based volumes map to a directory on the disk. We can use `du` or `quota` to track the usage of volumes.
 There exists a concept called `FsGroup` today in kubernetes, which lets users specify a gid for all volumes in a pod. If that is set, we can use the `FsGroup` gid for quota purposes. This requires `limits` for volumes to be a pod level resource though.
 ### Yet to be explored
 * Support for filesystems other than ext4 and xfs like `zfs`
 * Support for Btrfs
 It should be clear at this point that we need a plugin based model for disk accounting. Support for other filesystems both CoW and regular can be added as and when required. As we progress towards making accounting work on the above mentioned storage drivers, we can come up with an abstraction for storage plugins in general.
 ### Implementation Plan and Milestones
 #### Milestone 1 - Get accounting to just work!
 This milestone targets exposing the following categories of disk usage from the kubelet - infrastructure (images, sys daemons, etc), containers (log + writable layer) and volumes.
 * `du` works today. Use `du` for all the categories and ensure that it works on both on aufs and overlayfs.
 * Add device mapper support.
 * Define a storage driver based pluggable disk accounting interface in cadvisor.
 * Reuse that interface for accounting volumes in kubelet.
 * Define a disk manager module in kubelet that will serve as a source of disk usage information for the rest of the kubelet.
 * Ensure that the kubelet metrics APIs (/apis/metrics/v1beta1) exposes the disk usage information. Add an integration test.
 #### Milestone 2 - node reliability
 Improve user experience by doing whatever is necessary to keep the node running.
 NOTE: [`Out of Resource Killing`](https://github.com/kubernetes/kubernetes/issues/17186) design is a prerequisite.
 * Disk manager will evict pods and containers based on QoS class whenever the disk availability is below a critical level.
 * Explore combining existing container and image garbage collection logic into disk manager.
 Ideally, this phase should be completed before v1.2.
 #### Milestone 3 - Performance improvements
 In this milestone, we will add support for quota and make it opt-in. There should be no user visible changes in this phase.
 * Add gid allocation manager to kubelet
 * Reconcile gids allocated after restart.
 * Configure linux quota automatically on startup. Do not set any limits in this phase.
 * Allocate gids for pod volumes, container’s writable layer and logs, and also for image layers.
 * Update the docker runtime plugin in kubelet to perform the necessary `chown’s` and `chmod’s` between container creation and startup.
 * Pass the allocated gids as supplementary gids to containers.
 * Update disk manager in kubelet to use quota when configured.
 #### Milestone 4 - Users manage local disks
 In this milestone, we will make local disk a schedulable resource.
 * Finalize volume accounting - is it at the pod level or per-volume.
 * Finalize multi-disk management policy. Will additional disks be handled as whole units?
 * Set aside some space for image layers and rest of the infra overhead - node allocable resources includes local disk.
 * `du` plugin triggers container or pod eviction whenever usage exceeds limit.
 * Quota plugin sets hard limits equal to user specified `limits`.
 * Devicemapper plugin resizes writable layer to not exceed the container’s disk `limit`.
 * Disk manager evicts pods based on `usage` - `request` delta instead of just QoS class.
 * Sufficient integration testing to this feature.
 ### Appendix
 #### Implementation Notes
 The following is a rough outline of the testing I performed to corroborate by prior design ideas.
 Test setup information
 * Testing was performed on GCE virtual machines
 * All the test VMs were using ext4.
 * Distribution tested against is mentioned as part of each graph driver.
 ##### AUFS testing notes:
 Tested on Debian jessie
 1. Setup Linux Quota following this [tutorial](https://www.google.com/url?q=https://www.howtoforge.com/tutorial/linux-quota-ubuntu-debian/&sa=D&ust=1446146816105000&usg=AFQjCNHThn4nwfj1YLoVmv5fJ6kqAQ9FlQ).
 2. Create a new group ‘x’ on the host and enable quota for that group
    1. `groupadd -g 9000 x`
    2. `setquota -g 9000 -a 0 100 0 100` // 100 blocks (4096 bytes each*)
    3. `quota -g 9000 -v` // Check that quota is enabled
 3. Create a docker container
    4. `docker create -it busybox /bin/sh -c "dd if=/dev/zero of=/file count=10 bs=1M"`
 			8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d
 4. Change group on the writable layer directory for this container
    5. `chmod a+s /var/lib/docker/aufs/diff/8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d`
    6. `chown :x /var/lib/docker/aufs/diff/8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d`
 5. Start the docker container
    7. `docker start 8d`
    8. Check usage using quota and group ‘x’
 		```shell
 			$ quota -g x -v
 			Disk quotas for group x (gid 9000): 
 			Filesystem  **blocks**   quota   limit   grace   files   quota   limit   grace
 			/dev/sda1   **10248**       0       0               3       0       0
 		```
 	Using the same workflow, we can add new sticky group IDs to emptyDir volumes and account for their usage against pods.
 	Since each container requires a gid for the purposes of quota, we will have to reserve ranges of gids  for use by the kubelet. Since kubelet does not checkpoint its state, recovery of group id allocations will be an interesting problem. More on this later.
 Track the space occupied by images after it has been pulled locally as follows.
 *Note: This approach requires serialized image pulls to be of any use to the kubelet.*
 1. Create a group specifically for the graph driver
    1. `groupadd -g 9001 docker-images`
 2. Update group ownership on the ‘graph’ (tracks image metadata) and ‘storage driver’ directories.
    2. `chown -R :9001 /var/lib/docker/[overlay | aufs]`
    3. `chmod a+s /var/lib/docker/[overlay | aufs]`
    4. `chown -R :9001 /var/lib/docker/graph`
    5. `chmod a+s /var/lib/docker/graph`
 3. Any new images pulled or containers created will be accounted to the `docker-images` group by default.
 4. Once we update the group ownership on newly created containers to a different gid, the container writable layer’s specific disk usage gets dropped from this group.
 #### Overlayfs
 Tested on Ubuntu 15.10.
 Overlayfs works similar to Aufs. The path to the writable directory for container writable layer changes.
 * Setup Linux Quota following this [tutorial](https://www.google.com/url?q=https://www.howtoforge.com/tutorial/linux-quota-ubuntu-debian/&sa=D&ust=1446146816105000&usg=AFQjCNHThn4nwfj1YLoVmv5fJ6kqAQ9FlQ).
 * Create a new group ‘x’ on the host and enable quota for that group
    * `groupadd -g 9000 x`
    * `setquota -g 9000 -a 0 100 0 100` // 100 blocks (4096 bytes each*)
    * `quota -g 9000 -v` // Check that quota is enabled
 * Create a docker container
    * `docker create -it busybox /bin/sh -c "dd if=/dev/zero of=/file count=10 bs=1M"`
        * `b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61`
 * Change group on the writable layer’s directory for this container
    * `chmod -R a+s  /var/lib/docker/overlay/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*`
    * `chown -R :9000 /var/lib/docker/overlay/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*`
 * Check quota before and after running the container.
    ```shell
 	   $ quota -g x -v
 		Disk quotas for group x (gid 9000): 
 		Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
       /dev/sda1      48       0       0              19       0       0
   ```
    * Start the docker container
        * `docker start b8`
    * ```shell
 	  quota -g x -v
 	Disk quotas for group x (gid 9000):
     Filesystem  **blocks**   quota   limit   grace   files   quota   limit   grace
    /dev/sda1   **10288**       0       0                20      0       0
 	```
 ##### Device mapper
 Usage of Linux Quota should be possible for the purposes of volumes and log files.
 Devicemapper storage driver in docker uses ["thin targets"](https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt). Underneath there are two block devices devices - “data” and “metadata”, using which more block devices are created for containers. More information [here](http://www.projectatomic.io/docs/filesystems/).
 These devices can be loopback or real storage devices.
 The base device has a maximum storage capacity. This means that the sum total of storage space occupied by images and containers cannot exceed this capacity.
 By default, all images and containers are created from an initial filesystem with a 10GB limit. 
 A separate filesystem is created for each container as part of start (not create).
 It is possible to [resize](https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/) the container filesystem.  
 For the purposes of image space tracking, we can 
 ####Testing notes:
 * ```shell
 $ docker info
 ...
 Storage Driver: devicemapper
 Pool Name: **docker-8:1-268480-pool**
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.059 GB
 Data Space Total: 107.4 GB
 Data Space Available: 48.45 GB
 Metadata Space Used: 1.806 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.99 (2015-06-20)
 ```
 ```shell
 $ dmsetup table docker-8\:1-268480-pool 
 0 209715200 thin-pool 7:1 7:0 **128** 32768 1 skip_block_zeroing
 ```
 128 is the data block size
 Usage from kernel for the primary block device
 ```shell
 $ dmsetup status docker-8\:1-268480-pool 
 0 209715200 thin-pool 37 441/524288 **31424/1638400** - rw discard_passdown queue_if_no_space -
 ```
 Usage/Available - 31424/1638400
 Usage in MB = 31424 * 512 * 128 (block size from above) bytes = 1964 MB
 Capacity in MB = 1638400 * 512 * 128 bytes = 100 GB
 #### Log file accounting
 * Setup Linux quota for a container as mentioned above.
 * Update group ownership on the following directories to that of the container group ID created for graphing. Adapting the examples above:
    * `chmod -R a+s  /var/lib/docker/**containers**/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*`
    * `chown -R :9000 /var/lib/docker/**container**/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*`
 ##### Testing titbits
 * Ubuntu 15.10 doesn’t ship with the quota module on virtual machines. [Install ‘linux-image-extra-virtual’](http://askubuntu.com/questions/109585/quota-format-not-supported-in-kernel) package to get quota to work.
 * Overlay storage driver needs kernels >= 3.18. I used Ubuntu 15.10 to test Overlayfs.
 * If you use a non-default location for docker storage, change `/var/lib/docker` in the examples to your storage location.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/disk-accounting.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/dramatically-simplify-cluster-creation.md
+++ b/docs/proposals/dramatically-simplify-cluster-creation.md
@ -1,266 +1 @@
-# Proposal: Dramatically Simplify Kubernetes Cluster Creation
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dramatically-simplify-cluster-creation.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/dramatically-simplify-cluster-creation.md)
 > ***Please note: this proposal doesn't reflect final implementation, it's here for the purpose of capturing the original ideas.***
 > ***You should probably [read `kubeadm` docs](http://kubernetes.io/docs/getting-started-guides/kubeadm/), to understand the end-result of this effor.***
 Luke Marsden & many others in [SIG-cluster-lifecycle](https://github.com/kubernetes/community/tree/master/sig-cluster-lifecycle).
 17th August 2016
 *This proposal aims to capture the latest consensus and plan of action of SIG-cluster-lifecycle. It should satisfy the first bullet point [required by the feature description](https://github.com/kubernetes/features/issues/11).*
 See also: [this presentation to community hangout on 4th August 2016](https://docs.google.com/presentation/d/17xrFxrTwqrK-MJk0f2XCjfUPagljG7togXHcC39p0sM/edit?ts=57a33e24#slide=id.g158d2ee41a_0_76)
 ## Motivation
 Kubernetes is hard to install, and there are many different ways to do it today. None of them are excellent. We believe this is hindering adoption.
 ## Goals
 Have one recommended, official, tested, "happy path" which will enable a majority of new and existing Kubernetes users to:
 * Kick the tires and easily turn up a new cluster on infrastructure of their choice
 * Get a reasonably secure, production-ready cluster, with reasonable defaults and a range of easily-installable add-ons
 We plan to do so by improving and simplifying Kubernetes itself, rather than building lots of tooling which "wraps" Kubernetes by poking all the bits into the right place.
 ## Scope of project
 There are logically 3 steps to deploying a Kubernetes cluster:
 1. *Provisioning*: Getting some servers - these may be VMs on a developer's workstation, VMs in public clouds, or bare-metal servers in a user's data center.
 2. *Install & Discovery*: Installing the Kubernetes core components on those servers (kubelet, etc) - and bootstrapping the cluster to a state of basic liveness, including allowing each server in the cluster to discover other servers: for example teaching etcd servers about their peers, having TLS certificates provisioned, etc.
 3. *Add-ons*: Now that basic cluster functionality is working, installing add-ons such as DNS or a pod network (should be possible using kubectl apply).
 Notably, this project is *only* working on dramatically improving 2 and 3 from the perspective of users typing commands directly into root shells of servers. The reason for this is that there are a great many different ways of provisioning servers, and users will already have their own preferences.
 What's more, once we've radically improved the user experience of 2 and 3, it will make the job of tools that want to do all three much easier.
 ## User stories
 ### Phase I
 **_In time to be an alpha feature in Kubernetes 1.4._**
 Note: the current plan is to deliver `kubeadm` which implements these stories as "alpha" packages built from master (after the 1.4 feature freeze), but which are capable of installing a Kubernetes 1.4 cluster.
 * *Install*: As a potential Kubernetes user, I can deploy a Kubernetes 1.4 cluster on a handful of computers running Linux and Docker by typing two commands on each of those computers. The process is so simple that it becomes obvious to me how to easily automate it if I so wish.
 * *Pre-flight check*: If any of the computers don't have working dependencies installed (e.g. bad version of Docker, too-old Linux kernel), I am informed early on and given clear instructions on how to fix it so that I can keep trying until it works.
 * *Control*: Having provisioned a cluster, I can gain user credentials which allow me to remotely control it using kubectl.
 * *Install-addons*: I can select from a set of recommended add-ons to install directly after installing Kubernetes on my set of initial computers with kubectl apply.
 * *Add-node*: I can add another computer to the cluster.
 * *Secure*: As an attacker with (presumed) control of the network, I cannot add malicious nodes I control to the cluster created by the user. I also cannot remotely control the cluster.
 ### Phase II
 **_In time for Kubernetes 1.5:_**
 *Everything from Phase I as beta/stable feature, everything else below as beta feature in Kubernetes 1.5.*
 * *Upgrade*: Later, when Kubernetes 1.4.1 or any newer release is published, I can upgrade to it by typing one other command on each computer.
 * *HA*: If one of the computers in the cluster fails, the cluster carries on working. I can find out how to replace the failed computer, including if the computer was one of the masters.
 ## Top-down view: UX for Phase I items
 We will introduce a new binary, kubeadm, which ships with the Kubernetes OS packages (and binary tarballs, for OSes without package managers).
 ```
 laptop$ kubeadm --help
 kubeadm: bootstrap a secure kubernetes cluster easily.
    /==========================================================\
    | KUBEADM IS ALPHA, DO NOT USE IT FOR PRODUCTION CLUSTERS! |
    |                                                          |
    | But, please try it out! Give us feedback at:             |
    | https://github.com/kubernetes/kubernetes/issues          |
    | and at-mention @kubernetes/sig-cluster-lifecycle         |
    \==========================================================/
 Example usage:
    Create a two-machine cluster with one master (which controls the cluster),
    and one node (where workloads, like pods and containers run).
    On the first machine
    ====================
    master# kubeadm init master
    Your token is: <token>
    On the second machine
    =====================
    node# kubeadm join node --token=<token> <ip-of-master>
 Usage:
  kubeadm [command]
 Available Commands:
  init        Run this on the first server you deploy onto.
  join        Run this on other servers to join an existing cluster.
  user        Get initial admin credentials for a cluster.
  manual      Advanced, less-automated functionality, for power users.
 Use "kubeadm [command] --help" for more information about a command.
 ```
 ### Install
 *On first machine:*
 ```
 master# kubeadm init master
 Initializing kubernetes master...  [done]
 Cluster token: 73R2SIPM739TNZOA
 Run the following command on machines you want to become nodes:
  kubeadm join node --token=73R2SIPM739TNZOA <master-ip>
 You can now run kubectl here.
 ```
 *On N "node" machines:*
 ```
 node# kubeadm join node --token=73R2SIPM739TNZOA <master-ip>
 Initializing kubernetes node...    [done]
 Bootstrapping certificates...      [done]
 Joined node to cluster, see 'kubectl get nodes' on master.
 ```
 Note `[done]` would be colored green in all of the above.
 ### Install: alternative for automated deploy
 *The user (or their config management system) creates a token and passes the same one to both init and join.*
 ```
 master# kubeadm init master --token=73R2SIPM739TNZOA
 Initializing kubernetes master...  [done]
 You can now run kubectl here.
 ```
 ### Pre-flight check
 ```
 master# kubeadm init master
 Error: socat not installed. Unable to proceed.
 ```
 ### Control
 *On master, after Install, kubectl is automatically able to talk to localhost:8080:*
 ```
 master# kubectl get pods
 [normal kubectl output]
 ```
 *To mint new user credentials on the master:*
 ```
 master# kubeadm user create -o kubeconfig-bob bob
 Waiting for cluster to become ready...       [done]
 Creating user certificate for user...        [done]
 Waiting for user certificate to be signed... [done]
 Your cluster configuration file has been saved in kubeconfig.
 laptop# scp <master-ip>:/root/kubeconfig-bob ~/.kubeconfig
 laptop# kubectl get pods
 [normal kubectl output]
 ```
 ### Install-addons
 *Using CNI network as example:*
 ```
 master# kubectl apply --purge -f \
    https://git.io/kubernetes-addons/<X>.yaml
 [normal kubectl apply output]
 ```
 ### Add-node
 *Same as Install – "on node machines".*
 ### Secure
 ```
 node# kubeadm join --token=GARBAGE node <master-ip>
 Unable to join mesh network. Check your token.
 ```
 ## Work streams – critical path – must have in 1.4 before feature freeze
 1. [TLS bootstrapping](https://github.com/kubernetes/features/issues/43) - so that kubeadm can mint credentials for kubelets and users
    * Requires [#25764](https://github.com/kubernetes/kubernetes/pull/25764) and auto-signing [#30153](https://github.com/kubernetes/kubernetes/pull/30153) but does not require [#30094](https://github.com/kubernetes/kubernetes/pull/30094).
    * @philips, @gtank & @yifan-gu
 1. Fix for [#30515](https://github.com/kubernetes/kubernetes/issues/30515) - so that kubeadm can install a kubeconfig which kubelet then picks up
    * @smarterclayton
 ## Work streams – can land after 1.4 feature freeze
 1. [Debs](https://github.com/kubernetes/release/pull/35) and [RPMs](https://github.com/kubernetes/release/pull/50) (and binaries?) - so that kubernetes can be installed in the first place
    * @mikedanese & @dgoodwin
 1. [kubeadm implementation](https://github.com/lukemarsden/kubernetes/tree/kubeadm-scaffolding) - the kubeadm CLI itself, will get bundled into "alpha" kubeadm packages
    * @lukemarsden & @errordeveloper
 1. [Implementation of JWS server](https://github.com/jbeda/kubernetes/blob/discovery-api/docs/proposals/super-simple-discovery-api.md#method-jws-token) from [#30707](https://github.com/kubernetes/kubernetes/pull/30707) - so that we can implement the simple UX with no dependencies
    * @jbeda & @philips?
 1. Documentation - so that new users can see this in 1.4 (even if it’s caveated with alpha/experimental labels and flags all over it)
    * @lukemarsden
 1. `kubeadm` alpha packages
    * @lukemarsden, @mikedanese, @dgoodwin
 ### Nice to have
 1. [Kubectl apply --purge](https://github.com/kubernetes/kubernetes/pull/29551) - so that addons can be maintained using k8s infrastructure
    * @lukemarsden & @errordeveloper
 ## kubeadm implementation plan
 Based on [@philips' comment here](https://github.com/kubernetes/kubernetes/pull/30361#issuecomment-239588596).
 The key point with this implementation plan is that it requires basically no changes to kubelet except [#30515](https://github.com/kubernetes/kubernetes/issues/30515).
 It also doesn't require kubelet to do TLS bootstrapping - kubeadm handles that.
 ### kubeadm init master
 1. User installs and configures kubelet to look for manifests in `/etc/kubernetes/manifests`
 1. API server CA certs are generated by kubeadm
 1. kubeadm generates pod manifests to launch API server and etcd
 1. kubeadm pushes replica set for prototype jsw-server and the JWS into API server with host-networking so it is listening on the master node IP
 1. kubeadm prints out the IP of JWS server and JWS token
 ### kubeadm join node --token IP
 1. User installs and configures kubelet to have a kubeconfig at `/var/lib/kubelet/kubeconfig` but the kubelet is in a crash loop and is restarted by host init system
 1. kubeadm talks to jws-server on IP with token and gets the cacert, then talks to the apiserver TLS bootstrap API to get client cert, etc and generates a kubelet kubeconfig
 1. kubeadm places kubeconfig into `/var/lib/kubelet/kubeconfig` and waits for kubelet to restart
 1. Mission accomplished, we think.
 ## See also
 * [Joe Beda's "K8s the hard way easier"](https://docs.google.com/document/d/1lJ26LmCP-I_zMuqs6uloTgAnHPcuT7kOYtQ7XSgYLMA/edit#heading=h.ilgrv18sg5t) which combines Kelsey's "Kubernetes the hard way" with history of proposed UX at the end (scroll all the way down to the bottom).
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/dramatically-simplify-cluster-creation.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/external-lb-source-ip-preservation.md
+++ b/docs/proposals/external-lb-source-ip-preservation.md
@ -1,238 +1 @@
-<!-- BEGIN MUNGE: GENERATED_TOC -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/external-lb-source-ip-preservation.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/external-lb-source-ip-preservation.md)
 - [Overview](#overview)
  - [Motivation](#motivation)
 - [Alpha Design](#alpha-design)
  - [Overview](#overview-1)
  - [Traffic Steering using LB programming](#traffic-steering-using-lb-programming)
  - [Traffic Steering using Health Checks](#traffic-steering-using-health-checks)
  - [Choice of traffic steering approaches by individual Cloud Provider implementations](#choice-of-traffic-steering-approaches-by-individual-cloud-provider-implementations)
  - [API Changes](#api-changes)
    - [Local Endpoint Recognition Support](#local-endpoint-recognition-support)
    - [Service Annotation to opt-in for new behaviour](#service-annotation-to-opt-in-for-new-behaviour)
    - [NodePort allocation for HealthChecks](#nodeport-allocation-for-healthchecks)
  - [Behavior Changes expected](#behavior-changes-expected)
    - [External Traffic Blackholed on nodes with no local endpoints](#external-traffic-blackholed-on-nodes-with-no-local-endpoints)
    - [Traffic Balancing Changes](#traffic-balancing-changes)
  - [Cloud Provider support](#cloud-provider-support)
    - [GCE 1.4](#gce-14)
      - [GCE Expected Packet Source/Destination IP (Datapath)](#gce-expected-packet-sourcedestination-ip-datapath)
      - [GCE Expected Packet Destination IP (HealthCheck path)](#gce-expected-packet-destination-ip-healthcheck-path)
    - [AWS TBD](#aws-tbd)
    - [Openstack TBD](#openstack-tbd)
    - [Azure TBD](#azure-tbd)
  - [Testing](#testing)
 - [Beta Design](#beta-design)
  - [API Changes from Alpha to Beta](#api-changes-from-alpha-to-beta)
 - [Future work](#future-work)
 - [Appendix](#appendix)
 <!-- END MUNGE: GENERATED_TOC -->
 # Overview
 Kubernetes provides an external loadbalancer service type which creates a virtual external ip
 (in supported cloud provider environments) that can be used to load-balance traffic to
 the pods matching the service pod-selector.
 ## Motivation
 The current implementation requires that the cloud loadbalancer balances traffic across all
 Kubernetes worker nodes, and this traffic is then equally distributed to all the backend
 pods for that service.
 Due to the DNAT required to redirect the traffic to its ultimate destination, the return
 path for each session MUST traverse the same node again. To ensure this, the node also
 performs a SNAT, replacing the source ip with its own.
 This causes the service endpoint to see the session as originating from a cluster local ip address.
 *The original external source IP is lost*
 This is not a satisfactory solution - the original external source IP MUST be preserved for a
 lot of applications and customer use-cases.
 # Alpha Design
 This section describes the proposed design for
 [alpha-level](../../docs/devel/api_changes.md#alpha-beta-and-stable-versions) support, although
 additional features are described in [future work](#future-work).
 ## Overview
 The double hop must be prevented by programming the external load balancer to direct traffic
 only to nodes that have local pods for the service. This can be accomplished in two ways, either
 by API calls to add/delete nodes from the LB node pool or by adding health checking to the LB and
 failing/passing health checks depending on the presence of local pods.
 ## Traffic Steering using LB programming
 This approach requires that the Cloud LB be reprogrammed to be in sync with endpoint presence.
 Whenever the first service endpoint is scheduled onto a node, the node is added to the LB pool.
 Whenever the last service endpoint is unhealthy on a node, the node needs to be removed from the LB pool.
 This is a slow operation, on the order of 30-60 seconds, and involves the Cloud Provider API path.
 If the API endpoint is temporarily unavailable, the datapath will be misprogrammed till the
 reprogramming is successful and the API->datapath tables are updated by the cloud provider backend.
 ## Traffic Steering using Health Checks
 This approach requires that all worker nodes in the cluster be programmed into the LB target pool.
 To steer traffic only onto nodes that have endpoints for the service, we program the LB to perform
 node healthchecks. The kube-proxy daemons running on each node will be responsible for responding
 to these healthcheck requests (URL `/healthz`) from the cloud provider LB healthchecker. An additional nodePort
 will be allocated for these health check for this purpose.
 kube-proxy already watches for Service and Endpoint changes, it will maintain an in-memory lookup
 table indicating the number of local endpoints for each service.
 For a value of zero local endpoints, it responds with a health check failure (503 Service Unavailable),
 and success (200 OK) for non-zero values.
 Healthchecks are programmable with a min period of 1 second on most cloud provider LBs, and min
 failures to trigger node health state change can be configurable from 2 through 5.
 This will allow much faster transition times on the order of 1-5 seconds, and involve no
 API calls to the cloud provider (and hence reduce the impact of API unreliability), keeping the
 time window where traffic might get directed to nodes with no local endpoints to a minimum.
 ## Choice of traffic steering approaches by individual Cloud Provider implementations
 The cloud provider package may choose either of these approaches. kube-proxy will provide these
 healthcheck responder capabilities, regardless of the cloud provider configured on a cluster.
 ## API Changes
 ### Local Endpoint Recognition Support
 To allow kube-proxy to recognize if an endpoint is local requires that the EndpointAddress struct
 should also contain the NodeName it resides on. This new string field will be read-only and
 populated *only* by the Endpoints Controller.
 ### Service Annotation to opt-in for new behaviour
 A new annotation `service.alpha.kubernetes.io/external-traffic` will be recognized
 by the service controller only for services of Type LoadBalancer. Services that wish to opt-in to
 the new LoadBalancer behaviour must annotate the Service to request the new ESIPP behavior.
 Supported values for this annotation are OnlyLocal and Global.
 - OnlyLocal activates the new logic (described in this proposal) and balances locally within a node.
 - Global activates the old logic of balancing traffic across the entire cluster.
 ### NodePort allocation for HealthChecks
 An additional nodePort allocation will be necessary for services that are of type LoadBalancer and
 have the new annotation specified. This additional nodePort is necessary for kube-proxy to listen for
 healthcheck requests on all nodes.
 This NodePort will be added as an annotation (`service.alpha.kubernetes.io/healthcheck-nodeport`) to
 the Service after allocation (in the alpha release). The value of this annotation may also be
 specified during the Create call and the allocator will reserve that specific nodePort.
 ## Behavior Changes expected
 ### External Traffic Blackholed on nodes with no local endpoints
 When the last endpoint on the node has gone away and the LB has not marked the node as unhealthy,
 worst-case window size = (N+1) * HCP, where N = minimum failed healthchecks and HCP = Health Check Period,
 external traffic will still be steered to the node. This traffic will be blackholed and not forwarded
 to other endpoints elsewhere in the cluster.
 Internal pod to pod traffic should behave as before, with equal probability across all pods.
 ### Traffic Balancing Changes
 GCE/AWS load balancers do not provide weights for their target pools. This was not an issue with the old LB
 kube-proxy rules which would correctly balance across all endpoints.
 With the new functionality, the external traffic will not be equally load balanced across pods, but rather
 equally balanced at the node level (because GCE/AWS and other external LB implementations do not have the ability
 for specifying the weight per node, they balance equally across all target nodes, disregarding the number of
 pods on each node).
 We can, however, state that for NumServicePods << NumNodes or NumServicePods >> NumNodes, a fairly close-to-equal
 distribution will be seen, even without weights.
 Once the external load balancers provide weights, this functionality can be added to the LB programming path.
 *Future Work: No support for weights is provided for the 1.4 release, but may be added at a future date*
 ## Cloud Provider support
 This feature is added as an opt-in annotation.
 Default behaviour of LoadBalancer type services will be unchanged for all Cloud providers.
 The annotation will be ignored by existing cloud provider libraries until they add support.
 ### GCE 1.4
 For the 1.4 release, this feature will be implemented for the GCE cloud provider.
 #### GCE Expected Packet Source/Destination IP (Datapath)
 - Node: On the node, we expect to see the real source IP of the client. Destination IP will be the Service Virtual External IP.
 - Pod: For processes running inside the Pod network namepsace, the source IP will be the real client source IP. The destination address will the be Pod IP.
 #### GCE Expected Packet Destination IP (HealthCheck path)
 kube-proxy listens on the health check node port for TCP health checks on :::.
 This allow responding to health checks when the destination IP is either the VM IP or the Service Virtual External IP.
 In practice, tcpdump traces on GCE show source IP is 169.254.169.254 and destination address is the Service Virtual External IP.
 ### AWS TBD
 TBD *discuss timelines and feasibility with Kubernetes sig-aws team members*
 ### Openstack TBD
 This functionality may not be introduced in Openstack in the near term.
 *Note from Openstack team member @anguslees*
 Underlying vendor devices might be able to do this, but we only expose full-NAT/proxy loadbalancing through the OpenStack API (LBaaS v1/v2 and Octavia). So I'm afraid this will be unsupported on OpenStack, afaics.
 ### Azure TBD
 *To be confirmed* For the 1.4 release, this feature will be implemented for the Azure cloud provider.
 ## Testing
 The cases we should test are:
 1. Core Functionality Tests
 1.1 Source IP Preservation
 Test the main intent of this change, source ip preservation - use the all-in-one network tests container
 with new functionality that responds with the client IP. Verify the container is seeing the external IP
 of the test client.
 1.2 Health Check responses
 Testcases use pods explicitly pinned to nodes and delete/add to nodes randomly. Validate that healthchecks succeed
 and fail on the expected nodes as endpoints move around. Gather LB response times (time from pod declares ready to
 time for Cloud LB to declare node healthy and vice versa) to endpoint changes.
 2. Inter-Operability Tests
 Validate that internal cluster communications are still possible from nodes without local endpoints. This change
 is only for externally sourced traffic.
 3. Backward Compatibility Tests
 Validate that old and new functionality can simultaneously exist in a single cluster. Create services with and without
 the annotation, and validate datapath correctness.
 # Beta Design
 The only part of the design that changes for beta is the API, which is upgraded from
 annotation-based to first class fields.
 ## API Changes from Alpha to Beta
 Annotation `service.alpha.kubernetes.io/node-local-loadbalancer` will switch to a Service object field.
 # Future work
 Post-1.4 feature ideas. These are not fully-fleshed designs.
 # Appendix
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/external-lb-source-ip-preservation.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/federated-api-servers.md
+++ b/docs/proposals/federated-api-servers.md
@ -1,209 +1 @@
-# Federated API Servers
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-api-servers.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-api-servers.md)
 ## Abstract
 We want to divide the single monolithic API server into multiple federated
 servers. Anyone should be able to write their own federated API server to expose APIs they want.
 Cluster admins should be able to expose new APIs at runtime by bringing up new
 federated servers.
 ## Motivation
 * Extensibility: We want to allow community members to write their own API
  servers to expose APIs they want. Cluster admins should be able to use these
  servers without having to require any change in the core kubernetes
  repository.
 * Unblock new APIs from core kubernetes team review: A lot of new API proposals
  are currently blocked on review from the core kubernetes team. By allowing
  developers to expose their APIs as a separate server and enabling the cluster
  admin to use it without any change to the core kubernetes repository, we
  unblock these APIs.
 * Place for staging experimental APIs: New APIs can remain in separate
  federated servers until they become stable, at which point, they can be moved
  to the core kubernetes master, if appropriate.
 * Ensure that new APIs follow kubernetes conventions: Without the mechanism
  proposed here, community members might be forced to roll their own thing which
  may or may not follow kubernetes conventions.
 ## Goal
 * Developers should be able to write their own API server and cluster admins
  should be able to add them to their cluster, exposing new APIs at runtime. All
  of this should not require any change to the core kubernetes API server.
 * These new APIs should be seamless extension of the core kubernetes APIs (ex:
  they should be operated upon via kubectl).
 ## Non Goals
 The following are related but are not the goals of this specific proposal:
 * Make it easy to write a kubernetes API server.
 ## High Level Architecture
 There will be 2 new components in the cluster:
 * A simple program to summarize discovery information from all the servers.
 * A reverse proxy to proxy client requests to individual servers.
 The reverse proxy is optional. Clients can discover server URLs using the
 summarized discovery information and contact them directly. Simple clients, can
 always use the proxy.
 The same program can provide both discovery summarization and reverse proxy.
 ### Constraints
 * Unique API groups across servers: Each API server (and groups of servers, in HA)
  should expose unique API groups.
 * Follow API conventions: APIs exposed by every API server should adhere to [kubernetes API
  conventions](../devel/api-conventions.md).
 * Support discovery API: Each API server should support the kubernetes discovery API
  (list the suported groupVersions at `/apis` and list the supported resources
  at `/apis/<groupVersion>/`)
 * No bootstrap problem: The core kubernetes server should not depend on any
  other federated server to come up. Other servers can only depend on the core
  kubernetes server.
 ## Implementation Details
 ### Summarizing discovery information
 We can have a very simple Go program to summarize discovery information from all
 servers. Cluster admins will register each federated API server (its baseURL and swagger
 spec path) with the proxy. The proxy will summarize the list of all group versions
 exposed by all registered API servers with their individual URLs at `/apis`.
 ### Reverse proxy
 We can use any standard reverse proxy server like nginx or extend the same Go program that
 summarizes discovery information to act as reverse proxy for all federated servers.
 Cluster admins are also free to use any of the multiple open source API management tools
 (for example, there is [Kong](https://getkong.org/), which is written in lua and there is
 [Tyk](https://tyk.io/), which is written in Go). These API management tools
 provide a lot more functionality like: rate-limiting, caching, logging,
 transformations and authentication.
 In future, we can also use ingress. That will give cluster admins the flexibility to
 easily swap out the ingress controller by a Go reverse proxy, nginx, haproxy
 or any other solution they might want.
 ### Storage
 Each API server is responsible for storing their resources. They can have their
 own etcd or can use kubernetes server's etcd using [third party
 resources](../design/extending-api.md#adding-custom-resources-to-the-kubernetes-api-server).
 ### Health check
 Kubernetes server's `/api/v1/componentstatuses` will continue to report status
 of master components that it depends on (scheduler and various controllers).
 Since clients have access to server URLs, they can use that to do
 health check of individual servers.
 In future, if a global health check is required, we can expose a health check
 endpoint in the proxy that will report the status of all federated api servers
 in the cluster.
 ### Auth
 Since the actual server which serves client's request can be opaque to the client,
 all API servers need to have homogeneous authentication and authorisation mechanisms.
 All API servers will handle authn and authz for their resources themselves.
 In future, we can also have the proxy do the auth and then have apiservers trust
 it (via client certs) to report the actual user in an X-something header.
 For now, we will trust system admins to configure homogeneous auth on all servers.
 Future proposals will refine how auth is managed across the cluster.
 ### kubectl
 kubectl will talk to the discovery endpoint (or proxy) and use the discovery API to
 figure out the operations and resources supported in the cluster.
 Today, it uses RESTMapper to determine that. We will update kubectl code to populate
 RESTMapper using the discovery API so that we can add and remove resources
 at runtime.
 We will also need to make kubectl truly generic. Right now, a lot of operations
 (like get, describe) are hardcoded in the binary for all resources. A future
 proposal will provide details on moving those operations to server.
 Note that it is possible for kubectl to talk to individual servers directly in
 which case proxy will not be required at all, but this requires a bit more logic
 in kubectl. We can do this in future, if desired.
 ### Handling global policies
 Now that we have resources spread across multiple API servers, we need to
 be careful to ensure that global policies (limit ranges, resource quotas, etc) are enforced.
 Future proposals will improve how this is done across the cluster.
 #### Namespaces
 When a namespaced resource is created in any of the federated server, that
 server first needs to check with the kubernetes server that:
 * The namespace exists.
 * User has authorization to create resources in that namespace.
 * Resource quota for the namespace is not exceeded.
 To prevent race conditions, the kubernetes server might need to expose an atomic
 API for all these operations.
 While deleting a namespace, kubernetes server needs to ensure that resources in
 that namespace maintained by other servers are deleted as well. We can do this
 using resource [finalizers](../design/namespaces.md#finalizers). Each server
 will add themselves in the set of finalizers before they create a resource in
 the corresponding namespace and delete all their resources in that namespace,
 whenever it is to be deleted (kubernetes API server already has this code, we
 will refactor it into a library to enable reuse).
 Future proposal will talk about this in more detail and provide a better
 mechanism.
 #### Limit ranges and resource quotas
 kubernetes server maintains [resource quotas](../admin/resourcequota/README.md) and
 [limit ranges](../admin/limitrange/README.md) for all resources.
 Federated servers will need to check with the kubernetes server before creating any
 resource.
 ## Running on hosted kubernetes cluster
 This proposal is not enough for hosted cluster users, but allows us to improve
 that in the future.
 On a hosted kubernetes cluster, for e.g. on GKE - where Google manages the kubernetes
 API server, users will have to bring up and maintain the proxy and federated servers
 themselves.
 Other system components like the various controllers, will not be aware of the
 proxy and will only talk to the kubernetes API server.
 One possible solution to fix this is to update kubernetes API server to detect when
 there are federated servers in the cluster and then change its advertise address to
 the IP address of the proxy.
 Future proposal will talk about this in more detail.
 ## Alternatives
 There were other alternatives that we had discussed.
 * Instead of adding a proxy in front, let the core kubernetes server provide an
  API for other servers to register themselves. It can also provide a discovery
  API which the clients can use to discover other servers and then talk to them
  directly. But this would have required another server API a lot of client logic as well.
 * Validating federated servers: We can validate new servers when they are registered
  with the proxy, or keep validating them at regular intervals, or validate
  them only when explicitly requested, or not validate at all.
  We decided that the proxy will just assume that all the servers are valid
  (conform to our api conventions). In future, we can provide conformance tests.
 ## Future Work
 * Validate servers: We should have some conformance tests that validate that the
  servers follow kubernetes api-conventions.
 * Provide centralised auth service: It is very hard to ensure homogeneous auth
  across multiple federated servers, especially in case of hosted clusters
  (where different people control the different servers). We can fix it by
  providing a centralised authentication and authorization service which all of
  the servers can use.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federated-api-servers.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/federated-ingress.md
+++ b/docs/proposals/federated-ingress.md
@ -1,223 +1 @@
-<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-ingress.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-ingress.md)
 <!-- BEGIN STRIP_FOR_RELEASE -->
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 If you are using a released version of Kubernetes, you should
 refer to the docs that go with that version.
 Documentation for other releases can be found at
 [releases.k8s.io](http://releases.k8s.io).
 </strong>
 --
 <!-- END STRIP_FOR_RELEASE -->
 <!-- END MUNGE: UNVERSIONED_WARNING -->
 # Kubernetes Federated Ingress
 				  Requirements and High Level Design
 							Quinton Hoole
 							July 17, 2016
 ## Overview/Summary
 [Kubernetes Ingress](https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/ingress.md)
 provides an abstraction for sophisticated L7 load balancing through a
 single IP address (and DNS name) across multiple pods in a single
 Kubernetes cluster. Multiple alternative underlying implementations
 are provided, including one based on GCE L7 load balancing and another
 using an in-cluster nginx/HAProxy deployment (for non-GCE
 environments). An AWS implementation, based on Elastic Load Balancers
 and Route53 is under way by the community.
 To extend the above to cover multiple clusters, Kubernetes Federated
 Ingress aims to provide a similar/identical API abstraction and,
 again, multiple implementations to cover various
 cloud-provider-specific as well as multi-cloud scenarios. The general
 model is to allow the user to instantiate a single Ingress object via
 the Federation API, and have it automatically provision all of the
 necessary underlying resources (L7 cloud load balancers, in-cluster
 proxies etc) to provide L7 load balancing across a service spanning
 multiple clusters.
 Four options are outlined:
 1. GCP only
 1. AWS only
 1. Cross-cloud via GCP in-cluster proxies (i.e. clients get to AWS and on-prem via GCP).
 1. Cross-cloud via AWS in-cluster proxies  (i.e. clients get to GCP and on-prem via AWS).
 Option 1 is the:
 1. easiest/quickest,
 1. most featureful
 Recommendations:
 +  Suggest tackling option 1 (GCP only) first (target beta in v1.4)
 +  Thereafter option 3 (cross-cloud via GCP)
 +  We should encourage/facilitate the community to tackle option 2 (AWS-only)
 ## Options
 ## Google Cloud Platform only - backed by GCE L7 Load Balancers
 This is an option for federations across clusters which all run on Google Cloud Platform (i.e. GCE and/or GKE)
 ### Features
 In summary, all of [GCE L7 Load Balancer](https://cloud.google.com/compute/docs/load-balancing/http/) features:
 1. Single global virtual (a.k.a. "anycast") IP address ("VIP" - no dependence on dynamic DNS)
 1. Geo-locality for both external and GCP-internal clients
 1. Load-based overflow to next-closest geo-locality (i.e. cluster).  Based on either queries per second, or CPU load (unfortunately on the first-hop target VM, not the final destination K8s Service).
 1. URL-based request direction (different backend services can fulfill each different URL).
 1. HTTPS request termination (at the GCE load balancer, with server SSL certs)
 ### Implementation
 1. Federation user creates (federated) Ingress object (the services
   backing the ingress object must share the same nodePort, as they
   share a single GCP health check).
 1. Federated Ingress Controller creates Ingress object in each cluster
   in the federation (after [configuring each cluster ingress
   controller to share the same ingress UID](https://gist.github.com/bprashanth/52648b2a0b6a5b637f843e7efb2abc97)).
 1. Each cluster-level Ingress Controller ("GLBC") creates Google L7
   Load Balancer machinery (forwarding rules, target proxy, URL map,
   backend service, health check) which ensures that traffic to the
   Ingress (backed by a Service), is directed to the nodes in the cluster.
 1. KubeProxy redirects to one of the backend Pods (currently round-robin, per KubeProxy instance)
 An alternative implementation approach involves lifting the current
 Federated Ingress Controller functionality up into the Federation
 control plane.  This alternative is not considered any any further
 detail in this document.
 ### Outstanding work Items
 1. This should in theory all work out of the box.  Need to confirm
 with a manual setup. ([#29341](https://github.com/kubernetes/kubernetes/issues/29341))
 1. Implement Federated Ingress:
   1. API machinery (~1 day)
   1. Controller (~3 weeks)
 1. Add DNS field to Ingress object (currently missing, but needs to be added, independent of federation)
   1. API machinery (~1 day)
   1. KubeDNS support (~ 1 week?)
 ### Pros
 1. Global VIP is awesome - geo-locality, load-based overflow (but see caveats below)
 1. Leverages existing K8s Ingress machinery - not too much to add.
 1. Leverages existing Federated Service machinery - controller looks
    almost identical, DNS provider also re-used.
 ### Cons
 1. Only works across GCP clusters (but see below for a light at the end of the tunnel, for future versions).
 ## Amazon Web Services only - backed by Route53
 This is an option for AWS-only federations. Parts of this are
 apparently work in progress, see e.g.
 [AWS Ingress controller](https://github.com/kubernetes/contrib/issues/346)
 [[WIP/RFC] Simple ingress -> DNS controller, using AWS
 Route53](https://github.com/kubernetes/contrib/pull/841).
 ### Features
 In summary, most of the features of [AWS Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) and [Route53 DNS](https://aws.amazon.com/route53/).
 1. Geo-aware DNS direction to closest regional elastic load balancer
 1. DNS health checks to route traffic to only healthy elastic load
 balancers
 1. A variety of possible DNS routing types, including Latency Based Routing, Geo DNS, and Weighted Round Robin
 1. Elastic Load Balancing automatically routes traffic across multiple
  instances and multiple Availability Zones within the same region.
 1. Health checks ensure that only healthy Amazon EC2 instances receive traffic.
 ### Implementation
 1. Federation user creates (federated) Ingress object
 1. Federated Ingress Controller creates Ingress object in each cluster in the federation
 1. Each cluster-level AWS Ingress Controller creates/updates
   1. (regional) AWS Elastic Load Balancer machinery which ensures that traffic to the Ingress (backed by a Service), is directed to one of the nodes in one of the clusters in the region.
   1. (global) AWS Route53 DNS machinery which ensures that clients are directed to the closest non-overloaded (regional) elastic load balancer.
 1. KubeProxy redirects to one of the backend Pods (currently round-robin, per KubeProxy instance) in the destination K8s cluster.
 ### Outstanding Work Items
 Most of this remains is currently unimplemented ([AWS Ingress controller](https://github.com/kubernetes/contrib/issues/346)
 [[WIP/RFC] Simple ingress -> DNS controller, using AWS
 Route53](https://github.com/kubernetes/contrib/pull/841).
 1. K8s AWS Ingress Controller
 1.  Re-uses all of the non-GCE specific Federation machinery discussed above under "GCP-only...".
 ### Pros
 1. Geo-locality (via geo-DNS, not VIP)
 1. Load-based overflow
 1. Real load balancing (same caveats as for GCP above).
 1. L7 SSL connection termination.
 1. Seems it can be made to work for hybrid with on-premise (using VPC).  More research required.
 ### Cons
 1. K8s Ingress Controller still needs to be developed. Lots of work.
 1. geo-DNS based locality/failover is not as nice as VIP-based (but very useful, nonetheless)
 1. Only works on AWS (initial version, at least).
 ## Cross-cloud via GCP
 ### Summary
 Use GCP Federated Ingress machinery described above, augmented with additional HA-proxy backends in all GCP clusters to proxy to non-GCP clusters (via either Service External IP's, or VPN directly to KubeProxy or Pods).
 ### Features
 As per GCP-only above, except that geo-locality would be to the closest GCP cluster (and possibly onwards to the closest AWS/on-prem cluster).
 ### Implementation
 TBD - see Summary above in the mean time.
 ### Outstanding Work
 Assuming that GCP-only (see above) is complete:
 1. Wire-up the HA-proxy load balancers to redirect to non-GCP clusters
 1. Probably some more - additional detailed research and design necessary.
 ### Pros
 1. Works for cross-cloud.
 ### Cons
 1. Traffic to non-GCP clusters proxies through GCP clusters.  Additional bandwidth costs (3x?) in those cases.
 ## Cross-cloud via AWS
 In theory the same approach as "Cross-cloud via GCP" above could be used, except that AWS infrastructure would be used to get traffic first to an AWS cluster, and then proxied onwards to non-AWS and/or on-prem clusters.
 Detail docs TBD.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federated-ingress.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/federation-lite.md
+++ b/docs/proposals/federation-lite.md
@ -1,201 +1 @@
-# Kubernetes Multi-AZ Clusters
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federation-lite.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federation-lite.md)
 ## (previously nicknamed "Ubernetes-Lite")
 ## Introduction
 Full Cluster Federation will offer sophisticated federation between multiple kubernetes
 clusters, offering true high-availability, multiple provider support &
 cloud-bursting, multiple region support etc.  However, many users have
 expressed a desire for a "reasonably" high-available cluster, that runs in
 multiple zones on GCE or availability zones in AWS, and can tolerate the failure
 of a single zone without the complexity of running multiple clusters.
 Multi-AZ Clusters aim to deliver exactly that functionality: to run a single
 Kubernetes cluster in multiple zones.  It will attempt to make reasonable
 scheduling decisions, in particular so that a replication controller's pods are
 spread across zones, and it will try to be aware of constraints - for example
 that a volume cannot be mounted on a node in a different zone.
 Multi-AZ Clusters are deliberately limited in scope; for many advanced functions
 the answer will be "use full Cluster Federation".  For example, multiple-region
 support is not in scope.  Routing affinity (e.g. so that a webserver will
 prefer to talk to a backend service in the same zone) is similarly not in
 scope.
 ## Design
 These are the main requirements:
 1. kube-up must allow bringing up a cluster that spans multiple zones.
 1. pods in a replication controller should attempt to spread across zones.
 1. pods which require volumes should not be scheduled onto nodes in a different zone.
 1. load-balanced services should work reasonably
 ### kube-up support
 kube-up support for multiple zones will initially be considered
 advanced/experimental functionality, so the interface is not initially going to
 be particularly user-friendly.  As we design the evolution of kube-up, we will
 make multiple zones better supported.
 For the initial implementation, kube-up must be run multiple times, once for
 each zone.  The first kube-up will take place as normal, but then for each
 additional zone the user must run kube-up again, specifying
 `KUBE_USE_EXISTING_MASTER=true` and `KUBE_SUBNET_CIDR=172.20.x.0/24`.  This will then
 create additional nodes in a different zone, but will register them with the
 existing master.
 ### Zone spreading
 This will be implemented by modifying the existing scheduler priority function
 `SelectorSpread`.  Currently this priority function aims to put pods in an RC
 on different hosts, but it will be extended first to spread across zones, and
 then to spread across hosts.
 So that the scheduler does not need to call out to the cloud provider on every
 scheduling decision, we must somehow record the zone information for each node.
 The implementation of this will be described in the implementation section.
 Note that zone spreading is 'best effort'; zones are just be one of the factors
 in making scheduling decisions, and thus it is not guaranteed that pods will
 spread evenly across zones.  However, this is likely desirable: if a zone is
 overloaded or failing, we still want to schedule the requested number of pods.
 ### Volume affinity
 Most cloud providers (at least GCE and AWS) cannot attach their persistent
 volumes across zones.  Thus when a pod is being scheduled, if there is a volume
 attached, that will dictate the zone.  This will be implemented using a new
 scheduler predicate (a hard constraint): `VolumeZonePredicate`.
 When `VolumeZonePredicate` observes a pod scheduling request that includes a
 volume, if that volume is zone-specific, `VolumeZonePredicate` will exclude any
 nodes not in that zone.
 Again, to avoid the scheduler calling out to the cloud provider, this will rely
 on information attached to the volumes.  This means that this will only support
 PersistentVolumeClaims, because direct mounts do not have a place to attach
 zone information.  PersistentVolumes will then include zone information where
 volumes are zone-specific.
 ### Load-balanced services should operate reasonably
 For both AWS & GCE, Kubernetes creates a native cloud load-balancer for each
 service of type LoadBalancer.  The native cloud load-balancers on both AWS &
 GCE are region-level, and support load-balancing across instances in multiple
 zones (in the same region).  For both clouds, the behaviour of the native cloud
 load-balancer is reasonable in the face of failures (indeed, this is why clouds
 provide load-balancing as a primitve).
 For multi-AZ clusters we will therefore simply rely on the native cloud provider
 load balancer behaviour, and we do not anticipate substantial code changes.
 One notable shortcoming here is that load-balanced traffic still goes through
 kube-proxy controlled routing, and kube-proxy does not (currently) favor
 targeting a pod running on the same instance or even the same zone.  This will
 likely produce a lot of unnecessary cross-zone traffic (which is likely slower
 and more expensive).  This might be sufficiently low-hanging fruit that we
 choose to address it in kube-proxy / multi-AZ clusters, but this can be addressed
 after the initial implementation.
 ## Implementation
 The main implementation points are:
 1. how to attach zone information to Nodes and PersistentVolumes
 1. how nodes get zone information
 1. how volumes get zone information
 ### Attaching zone information
 We must attach zone information to Nodes and PersistentVolumes, and possibly to
 other resources in future.  There are two obvious alternatives: we can use
 labels/annotations, or we can extend the schema to include the information.
 For the initial implementation, we propose to use labels.  The reasoning is:
 1. It is considerably easier to implement.
 1. We will reserve the two labels `failure-domain.alpha.kubernetes.io/zone` and
 `failure-domain.alpha.kubernetes.io/region` for the two pieces of information
 we need.  By putting this under the `kubernetes.io` namespace there is no risk
 of collision, and by putting it under `alpha.kubernetes.io` we clearly mark
 this as an experimental feature.
 1. We do not yet know whether these labels will be sufficient for all
 environments, nor which entities will require zone information.  Labels give us
 more flexibility here.
 1. Because the labels are reserved, we can move to schema-defined fields in
 future using our cross-version mapping techniques.
 ### Node labeling
 We do not want to require an administrator to manually label nodes.  We instead
 modify the kubelet to include the appropriate labels when it registers itself.
 The information is easily obtained by the kubelet from the cloud provider.
 ### Volume labeling
 As with nodes, we do not want to require an administrator to manually label
 volumes.  We will create an admission controller `PersistentVolumeLabel`.
 `PersistentVolumeLabel` will intercept requests to create PersistentVolumes,
 and will label them appropriately by calling in to the cloud provider.
 ## AWS Specific Considerations
 The AWS implementation here is fairly straightforward.  The AWS API is
 region-wide, meaning that a single call will find instances and volumes in all
 zones.  In addition, instance ids and volume ids are unique per-region (and
 hence also per-zone).  I believe they are actually globally unique, but I do
 not know if this is guaranteed; in any case we only need global uniqueness if
 we are to span regions, which will not be supported by multi-AZ clusters (to do
 that correctly requires a full Cluster Federation type approach).
 ## GCE Specific Considerations
 The GCE implementation is more complicated than the AWS implementation because
 GCE APIs are zone-scoped.  To perform an operation, we must perform one REST
 call per zone and combine the results, unless we can determine in advance that
 an operation references a particular zone.  For many operations, we can make
 that determination, but in some cases - such as listing all instances, we must
 combine results from calls in all relevant zones.
 A further complexity is that GCE volume names are scoped per-zone, not
 per-region.  Thus it is permitted to have two volumes both named `myvolume` in
 two different GCE zones. (Instance names are currently unique per-region, and
 thus are not a problem for multi-AZ clusters).
 The volume scoping leads to a (small) behavioural change for multi-AZ clusters on
 GCE.  If you had two volumes both named `myvolume` in two different GCE zones,
 this would not be ambiguous when Kubernetes is operating only in a single zone.
 But, when operating a cluster across multiple zones, `myvolume` is no longer
 sufficient to specify a volume uniquely.  Worse, the fact that a volume happens
 to be unambigious at a particular time is no guarantee that it will continue to
 be unambigious in future, because a volume with the same name could
 subsequently be created in a second zone.  While perhaps unlikely in practice,
 we cannot automatically enable multi-AZ clusters for GCE users if this then causes
 volume mounts to stop working.
 This suggests that (at least on GCE), multi-AZ clusters must be optional (i.e.
 there must be a feature-flag).  It may be that we can make this feature
 semi-automatic in future, by detecting whether nodes are running in multiple
 zones, but it seems likely that kube-up could instead simply set this flag.
 For the initial implementation, creating volumes with identical names will
 yield undefined results.  Later, we may add some way to specify the zone for a
 volume (and possibly require that volumes have their zone specified when
 running in multi-AZ cluster mode).  We could add a new `zone` field to the
 PersistentVolume type for GCE PD volumes, or we could use a DNS-style dotted
 name for the volume name (<name>.<zone>)
 Initially therefore, the GCE changes will be to:
 1. change kube-up to support creation of a cluster in multiple zones
 1. pass a flag enabling multi-AZ clusters with kube-up
 1. change the kubernetes cloud provider to iterate through relevant zones when resolving items
 1. tag GCE PD volumes with the appropriate zone information
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federation-lite.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/federation.md
+++ b/docs/proposals/federation.md
@ -1,648 +1 @@
-# Kubernetes Cluster Federation
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federation.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federation.md)
 ## (previously nicknamed "Ubernetes")
 ## Requirements Analysis and Product Proposal
 ## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_
 _Initial revision: 2015-03-05_
 _Last updated: 2015-08-20_
 This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2)
 Original slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides)
 Updated slides: [tinyurl.com/ubernetes-whereto](http://tinyurl.com/ubernetes-whereto)
 ## Introduction
 Today, each Kubernetes cluster is a relatively self-contained unit,
 which typically runs in a single "on-premise" data centre or single
 availability zone of a cloud provider (Google's GCE, Amazon's AWS,
 etc).
 Several current and potential Kubernetes users and customers have
 expressed a keen interest in tying together ("federating") multiple
 clusters in some sensible way in order to enable the following kinds
 of use cases (intentionally vague):
 1. _"Preferentially run my workloads in my on-premise cluster(s), but
   automatically overflow to my cloud-hosted cluster(s) if I run out
   of on-premise capacity"_.
 1. _"Most of my workloads should run in my preferred cloud-hosted
   cluster(s), but some are privacy-sensitive, and should be
   automatically diverted to run in my secure, on-premise
   cluster(s)"_.
 1. _"I want to avoid vendor lock-in, so I want my workloads to run
   across multiple cloud providers all the time.  I change my set of
   such cloud providers, and my pricing contracts with them,
   periodically"_.
 1. _"I want to be immune to any single data centre or cloud
   availability zone outage, so I want to spread my service across
   multiple such zones (and ideally even across multiple cloud
   providers)."_
 The above use cases are by necessity left imprecisely defined.  The
 rest of this document explores these use cases and their implications
 in further detail, and compares a few alternative high level
 approaches to addressing them.  The idea of cluster federation has
 informally become known as _"Ubernetes"_.
 ## Summary/TL;DR
 Four primary customer-driven use cases are explored in more detail.
 The two highest priority ones relate to High Availability and
 Application Portability (between cloud providers, and between
 on-premise and cloud providers).
 Four primary federation primitives are identified (location affinity,
 cross-cluster scheduling, service discovery and application
 migration).  Fortunately not all four of these primitives are required
 for each primary use case, so incremental development is feasible.
 ## What exactly is a Kubernetes Cluster?
 A central design concept in Kubernetes is that of a _cluster_. While
 loosely speaking, a cluster can be thought of as running in a single
 data center, or cloud provider availability zone, a more precise
 definition is that each cluster provides:
 1. a single Kubernetes API entry point,
 1. a consistent, cluster-wide resource naming scheme
 1. a scheduling/container placement domain
 1. a service network routing domain
 1. an authentication and authorization model.
 The above in turn imply the need for a relatively performant, reliable
 and cheap network within each cluster.
 There is also assumed to be some degree of failure correlation across
 a cluster, i.e.  whole clusters are expected to fail, at least
 occasionally (due to cluster-wide power and network failures, natural
 disasters etc). Clusters are often relatively homogeneous in that all
 compute nodes are typically provided by a single cloud provider or
 hardware vendor, and connected by a common, unified network fabric.
 But these are not hard requirements of Kubernetes.
 Other classes of Kubernetes deployments than the one sketched above
 are technically feasible, but come with some challenges of their own,
 and are not yet common or explicitly supported.
 More specifically, having a Kubernetes cluster span multiple
 well-connected availability zones within a single geographical region
 (e.g. US North East, UK, Japan etc) is worthy of further
 consideration, in particular because it potentially addresses
 some of these requirements.
 ## What use cases require Cluster Federation?
 Let's name a few concrete use cases to aid the discussion:
 ## 1.Capacity Overflow
 _"I want to preferentially run my workloads in my on-premise cluster(s), but automatically "overflow" to my cloud-hosted cluster(s) when I run out of on-premise capacity."_
 This idea is known in some circles as "[cloudbursting](http://searchcloudcomputing.techtarget.com/definition/cloud-bursting)".
 **Clarifying questions:** What is the unit of overflow?  Individual
  pods? Probably not always.  Replication controllers and their
  associated sets of pods?  Groups of replication controllers
  (a.k.a. distributed applications)?  How are persistent disks
  overflowed?  Can the "overflowed" pods communicate with their
  brethren and sistren pods and services in the other cluster(s)?
  Presumably yes, at higher cost and latency, provided that they use
  external service discovery. Is "overflow" enabled only when creating
  new workloads/replication controllers, or are existing workloads
  dynamically migrated between clusters based on fluctuating available
  capacity?  If so, what is the desired behaviour, and how is it
  achieved?  How, if at all, does this relate to quota enforcement
  (e.g. if we run out of on-premise capacity, can all or only some
  quotas transfer to other, potentially more expensive off-premise
  capacity?)
 It seems that most of this boils down to:
 1. **location affinity** (pods relative to each other, and to other
   stateful services like persistent storage - how is this expressed
   and enforced?)
 1. **cross-cluster scheduling** (given location affinity constraints
   and other scheduling policy, which resources are assigned to which
   clusters, and by what?)
 1. **cross-cluster service discovery** (how do pods in one cluster
   discover and communicate with pods in another cluster?)
 1. **cross-cluster migration** (how do compute and storage resources,
   and the distributed applications to which they belong, move from
   one cluster to another)
 1. **cross-cluster load-balancing** (how does is user traffic directed
   to an appropriate cluster?)
 1. **cross-cluster monitoring and auditing** (a.k.a. Unified Visibility)
 ## 2. Sensitive Workloads
 _"I want most of my workloads to run in my preferred cloud-hosted
 cluster(s), but some are privacy-sensitive, and should be
 automatically diverted to run in my secure, on-premise cluster(s). The
 list of privacy-sensitive workloads changes over time, and they're
 subject to external auditing."_
 **Clarifying questions:**
 1. What kinds of rules determine which
 workloads go where?
  1. Is there in fact a requirement to have these rules be
      declaratively expressed and automatically enforced, or is it
      acceptable/better to have users manually select where to run
      their workloads when starting them?
  1. Is a static mapping from container (or more typically,
      replication controller) to cluster maintained and enforced?
  1. If so, is it only enforced on startup, or are things migrated
      between clusters when the mappings change?
 This starts to look quite similar to "1. Capacity Overflow", and again
 seems to boil down to:
 1. location affinity
 1. cross-cluster scheduling
 1. cross-cluster service discovery
 1. cross-cluster migration
 1. cross-cluster monitoring and auditing
 1. cross-cluster load balancing
 ## 3. Vendor lock-in avoidance
 _"My CTO wants us to avoid vendor lock-in, so she wants our workloads
 to run across multiple cloud providers at all times.  She changes our
 set of preferred cloud providers and pricing contracts with them
 periodically, and doesn't want to have to communicate and manually
 enforce these policy changes across the organization every time this
 happens.  She wants it centrally and automatically enforced, monitored
 and audited."_
 **Clarifying questions:**
 1. How does this relate to other use cases (high availability,
 capacity overflow etc), as they may all be across multiple vendors.
 It's probably not strictly speaking a separate
 use case, but it's brought up so often as a requirement, that it's
 worth calling out explicitly.
 1. Is a useful intermediate step to make it as simple as possible to
   migrate an application from one vendor to another in a one-off fashion?
 Again, I think that this can probably be
  reformulated as a Capacity Overflow problem - the fundamental
  principles seem to be the same or substantially similar to those
  above.
 ## 4. "High Availability"
 _"I want to be immune to any single data centre or cloud availability
 zone outage, so I want to spread my service across multiple such zones
 (and ideally even across multiple cloud providers), and have my
 service remain available even if one of the availability zones or
 cloud providers "goes down"_.
 It seems useful to split this into multiple sets of sub use cases:
 1. Multiple availability zones within a single cloud provider (across
   which feature sets like private networks, load balancing,
   persistent disks, data snapshots etc are typically consistent and
   explicitly designed to inter-operate).
   1. within the same geographical region (e.g. metro) within which network
   is fast and cheap enough to be almost analogous to a single data
   center.
   1. across multiple geographical regions, where high network cost and
   poor network performance may be prohibitive.
 1. Multiple cloud providers (typically with inconsistent feature sets,
   more limited interoperability, and typically no cheap inter-cluster
   networking described above).
 The single cloud provider case might be easier to implement (although
 the multi-cloud provider implementation should just work for a single
 cloud provider).  Propose high-level design catering for both, with
 initial implementation targeting single cloud provider only.
 **Clarifying questions:**
 **How does global external service discovery work?** In the steady
  state, which external clients connect to which clusters?  GeoDNS or
  similar?  What is the tolerable failover latency if a cluster goes
  down?  Maybe something like (make up some numbers, notwithstanding
  some buggy DNS resolvers, TTL's, caches etc) ~3 minutes for ~90% of
  clients to re-issue DNS lookups and reconnect to a new cluster when
  their home cluster fails is good enough for most Kubernetes users
  (or at least way better than the status quo), given that these sorts
  of failure only happen a small number of times a year?
 **How does dynamic load balancing across clusters work, if at all?**
  One simple starting point might be "it doesn't".  i.e. if a service
  in a cluster is deemed to be "up", it receives as much traffic as is
  generated "nearby" (even if it overloads).  If the service is deemed
  to "be down" in a given cluster, "all" nearby traffic is redirected
  to some other cluster within some number of seconds (failover could
  be automatic or manual).  Failover is essentially binary.  An
  improvement would be to detect when a service in a cluster reaches
  maximum serving capacity, and dynamically divert additional traffic
  to other clusters.  But how exactly does all of this work, and how
  much of it is provided by Kubernetes, as opposed to something else
  bolted on top (e.g. external monitoring and manipulation of GeoDNS)?
 **How does this tie in with auto-scaling of services?** More
  specifically, if I run my service across _n_ clusters globally, and
  one (or more) of them fail, how do I ensure that the remaining _n-1_
  clusters have enough capacity to serve the additional, failed-over
  traffic?  Either:
 1. I constantly over-provision all clusters by 1/n (potentially expensive), or
 1. I "manually" (or automatically) update my replica count configurations in the
   remaining clusters by 1/n when the failure occurs, and Kubernetes
   takes care of the rest for me, or
 1. Auto-scaling in the remaining clusters takes
   care of it for me automagically as the additional failed-over
   traffic arrives (with some latency).  Note that this implies that
   the cloud provider keeps the necessary resources on hand to
   accommodate such auto-scaling (e.g. via something similar to AWS reserved
   and spot instances)
 Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above.  It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs).  Nor does it necessarily involve cross-cluster service discovery or location affinity.  As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others).
 All of the above (regarding "Unavailability Zones") refers primarily
 to already-running user-facing services, and minimizing the impact on
 end users of those services becoming unavailable in a given cluster.
 What about the people and systems that deploy Kubernetes services
 (devops etc)?  Should they be automatically shielded from the impact
 of the cluster outage? i.e. have their new resource creation requests
 automatically diverted to another cluster during the outage?  While
 this specific requirement seems non-critical (manual fail-over seems
 relatively non-arduous, ignoring the user-facing issues above), it
 smells a lot like the first three use cases listed above ("Capacity
 Overflow, Sensitive Services, Vendor lock-in..."), so if we address
 those, we probably get this one free of charge.
 ## Core Challenges of Cluster Federation
 As we saw above, a few common challenges fall out of most of the use
 cases considered above, namely:
 ## Location Affinity
 Can the pods comprising a single distributed application be
 partitioned across more than one cluster? More generally, how far
 apart, in network terms, can a given client and server within a
 distributed application reasonably be?  A server need not necessarily
 be a pod, but could instead be a persistent disk housing data, or some
 other stateful network service.  What is tolerable is typically
 application-dependent, primarily influenced by network bandwidth
 consumption, latency requirements and cost sensitivity.
 For simplicity, let's assume that all Kubernetes distributed
 applications fall into one of three categories with respect to relative
 location affinity:
 1. **"Strictly Coupled"**: Those applications that strictly cannot be
   partitioned between clusters.  They simply fail if they are
   partitioned.  When scheduled, all pods _must_ be scheduled to the
   same cluster.  To move them, we need to shut the whole distributed
   application down (all pods) in one cluster, possibly move some
   data, and then bring the up all of the pods in another cluster.  To
   avoid downtime, we might bring up the replacement cluster and
   divert traffic there before turning down the original, but the
   principle is much the same.  In some cases moving the data might be
   prohibitively expensive or time-consuming, in which case these
   applications may be effectively _immovable_.
 1. **"Strictly Decoupled"**: Those applications that can be
   indefinitely partitioned across more than one cluster, to no
   disadvantage.  An embarrassingly parallel YouTube porn detector,
   where each pod repeatedly dequeues a video URL from a remote work
   queue, downloads and chews on the video for a few hours, and
   arrives at a binary verdict, might be one such example. The pods
   derive no benefit from being close to each other, or anything else
   (other than the source of YouTube videos, which is assumed to be
   equally remote from all clusters in this example).  Each pod can be
   scheduled independently, in any cluster, and moved at any time.
 1. **"Preferentially Coupled"**: Somewhere between Coupled and
   Decoupled.  These applications prefer to have all of their pods
   located in the same cluster (e.g. for failure correlation, network
   latency or bandwidth cost reasons), but can tolerate being
   partitioned for "short" periods of time (for example while
   migrating the application from one cluster to another). Most small
   to medium sized LAMP stacks with not-very-strict latency goals
   probably fall into this category (provided that they use sane
   service discovery and reconnect-on-fail, which they need to do
   anyway to run effectively, even in a single Kubernetes cluster).
 From a fault isolation point of view, there are also opposites of the
 above.  For example, a master database and its slave replica might
 need to be in different availability zones.  We'll refer to this a
 anti-affinity, although it is largely outside the scope of this
 document.
 Note that there is somewhat of a continuum with respect to network
 cost and quality between any two nodes, ranging from two nodes on the
 same L2 network segment (lowest latency and cost, highest bandwidth)
 to two nodes on different continents (highest latency and cost, lowest
 bandwidth). One interesting point on that continuum relates to
 multiple availability zones within a well-connected metro or region
 and single cloud provider.  Despite being in different data centers,
 or areas within a mega data center, network in this case is often very fast
 and effectively free or very cheap. For the purposes of this network location
 affinity discussion, this case is considered analogous to a single
 availability zone. Furthermore, if a given application doesn't fit
 cleanly into one of the above, shoe-horn it into the best fit,
 defaulting to the "Strictly Coupled and Immovable" bucket if you're
 not sure.
 And then there's what I'll call _absolute_ location affinity.  Some
 applications are required to run in bounded geographical or network
 topology locations.  The reasons for this are typically
 political/legislative (data privacy laws etc), or driven by network
 proximity to consumers (or data providers) of the application ("most
 of our users are in Western Europe, U.S. West Coast" etc).
 **Proposal:** First tackle Strictly Decoupled applications (which can
  be trivially scheduled, partitioned or moved, one pod at a time).
  Then tackle Preferentially Coupled applications (which must be
  scheduled in totality in a single cluster, and can be moved, but
  ultimately in total, and necessarily within some bounded time).
  Leave strictly coupled applications to be manually moved between
  clusters as required for the foreseeable future.
 ## Cross-cluster service discovery
 I propose having pods use standard discovery methods used by external
 clients of Kubernetes applications (i.e. DNS).  DNS might resolve to a
 public endpoint in the local or a remote cluster. Other than Strictly
 Coupled applications, software should be largely oblivious of which of
 the two occurs.
 _Aside:_ How do we avoid "tromboning" through an external VIP when DNS
 resolves to a public IP on the local cluster?  Strictly speaking this
 would be an optimization for some cases, and probably only matters to
 high-bandwidth, low-latency communications.  We could potentially
 eliminate the trombone with some kube-proxy magic if necessary. More
 detail to be added here, but feel free to shoot down the basic DNS
 idea in the mean time.  In addition, some applications rely on private
 networking between clusters for security (e.g. AWS VPC or more
 generally VPN).  It should not be necessary to forsake this in
 order to use Cluster Federation, for example by being forced to use public
 connectivity between clusters.
 ## Cross-cluster Scheduling
 This is closely related to location affinity above, and also discussed
 there.  The basic idea is that some controller, logically outside of
 the basic Kubernetes control plane of the clusters in question, needs
 to be able to:
 1. Receive "global" resource creation requests.
 1. Make policy-based decisions as to which cluster(s) should be used
   to fulfill each given resource request. In a simple case, the
   request is just redirected to one cluster.  In a more complex case,
   the request is "demultiplexed" into multiple sub-requests, each to
   a different cluster. Knowledge of the (albeit approximate)
   available capacity in each cluster will be required by the
   controller to sanely split the request.  Similarly, knowledge of
   the properties of the application (Location Affinity class --
   Strictly Coupled, Strictly Decoupled etc, privacy class etc) will
   be required.  It is also conceivable that knowledge of service
   SLAs and monitoring thereof might provide an input into
   scheduling/placement algorithms.
 1. Multiplex the responses from the individual clusters into an
   aggregate response.
 There is of course a lot of detail still missing from this section,
 including discussion of:
 1. admission control
 1. initial placement of instances of a new
 service vs. scheduling new instances of an existing service in response
 to auto-scaling
 1. rescheduling pods due to failure (response might be
 different depending on if it's failure of a node, rack, or whole AZ)
 1. data placement relative to compute capacity,
 etc.
 ## Cross-cluster Migration
 Again this is closely related to location affinity discussed above,
 and is in some sense an extension of Cross-cluster Scheduling. When
 certain events occur, it becomes necessary or desirable for the
 cluster federation system to proactively move distributed applications
 (either in part or in whole) from one cluster to another. Examples of
 such events include:
 1. A low capacity event in a cluster (or a cluster failure).
 1. A change of scheduling policy ("we no longer use cloud provider X").
 1. A change of resource pricing ("cloud provider Y dropped their
   prices - let's migrate there").
 Strictly Decoupled applications can be trivially moved, in part or in
 whole, one pod at a time, to one or more clusters (within applicable
 policy constraints, for example "PrivateCloudOnly").
 For Preferentially Decoupled applications, the federation system must
 first locate a single cluster with sufficient capacity to accommodate
 the entire application, then reserve that capacity, and incrementally
 move the application, one (or more) resources at a time, over to the
 new cluster, within some bounded time period (and possibly within a
 predefined "maintenance" window).  Strictly Coupled applications (with
 the exception of those deemed completely immovable) require the
 federation system to:
 1. start up an entire replica application in the destination cluster
 1. copy persistent data to the new application instance (possibly
   before starting pods)
 1. switch user traffic across
 1. tear down the original application instance
 It is proposed that support for automated migration of Strictly
 Coupled applications be deferred to a later date.
 ## Other Requirements
 These are often left implicit by customers, but are worth calling out explicitly:
 1. Software failure isolation between Kubernetes clusters should be
   retained as far as is practically possible.  The federation system
   should not materially increase the failure correlation across
   clusters.  For this reason the federation control plane software
   should ideally be completely independent of the Kubernetes cluster
   control software, and look just like any other Kubernetes API
   client, with no special treatment.  If the federation control plane
   software fails catastrophically, the underlying Kubernetes clusters
   should remain independently usable.
 1. Unified monitoring, alerting and auditing across federated Kubernetes clusters.
 1. Unified authentication, authorization and quota management across
   clusters (this is in direct conflict with failure isolation above,
   so there are some tough trade-offs to be made here).
 ## Proposed High-Level Architectures
 Two distinct potential architectural approaches have emerged from discussions
 thus far:
 1. An explicitly decoupled and hierarchical architecture, where the
    Federation Control Plane sits logically above a set of independent
    Kubernetes clusters, each of which is (potentially) unaware of the
    other clusters, and of the Federation Control Plane itself (other
    than to the extent that it is an API client much like any other).
    One possible example of this general architecture is illustrated
    below, and will be referred to as the "Decoupled, Hierarchical"
    approach.
 1. A more monolithic architecture, where a single instance of the
    Kubernetes control plane itself manages a single logical cluster
    composed of nodes in multiple availability zones and cloud
    providers.
 A very brief, non-exhaustive list of pro's and con's of the two
 approaches follows.  (In the interest of full disclosure, the author
 prefers the Decoupled Hierarchical model for the reasons stated below).
 1. **Failure isolation:** The Decoupled Hierarchical approach provides
    better failure isolation than the Monolithic approach, as each
    underlying Kubernetes cluster, and the Federation Control Plane,
    can operate and fail completely independently of each other.  In
    particular, their software and configurations can be updated
    independently. Such updates are, in our experience, the primary
    cause of control-plane failures, in general.
 1. **Failure probability:** The Decoupled Hierarchical model incorporates
    numerically more independent pieces of software and configuration
    than the Monolithic one. But the complexity of each of these
    decoupled pieces is arguably better contained in the Decoupled
    model (per standard arguments for modular rather than monolithic
    software design).  Which of the two models presents higher
    aggregate complexity and consequent failure probability remains
    somewhat of an open question.
 1. **Scalability:** Conceptually the Decoupled Hierarchical model wins
    here, as each underlying Kubernetes cluster can be scaled
    completely independently w.r.t. scheduling, node state management,
    monitoring, network connectivity etc. It is even potentially
    feasible to stack federations of clusters (i.e. create
    federations of federations) should scalability of the independent
    Federation Control Plane become an issue (although the author does
    not envision this being a problem worth solving in the short
    term).
 1. **Code complexity:** I think that an argument can be made both ways
    here. It depends on whether you prefer to weave the logic for
    handling nodes in multiple availability zones and cloud providers
    within a single logical cluster into the existing Kubernetes
    control plane code base (which was explicitly not designed for
    this), or separate it into a decoupled Federation system (with
    possible code sharing between the two via shared libraries).  The
    author prefers the latter because it:
  1. Promotes better code modularity and interface design.
  1. Allows the code
      bases of Kubernetes and the Federation system to progress
      largely independently (different sets of developers, different
      release schedules etc).
 1. **Administration complexity:** Again, I think that this could be argued
    both ways.  Superficially it would seem that administration of a
    single Monolithic multi-zone cluster might be simpler by virtue of
    being only "one thing to manage", however in practise each of the
    underlying availability zones (and possibly cloud providers) has
    its own capacity, pricing, hardware platforms, and possibly
    bureaucratic boundaries (e.g. "our EMEA IT department manages those
    European clusters").  So explicitly allowing for (but not
    mandating) completely independent administration of each
    underlying Kubernetes cluster, and the Federation system itself,
    in the Decoupled Hierarchical model seems to have real practical
    benefits that outweigh the superficial simplicity of the
    Monolithic model.
 1. **Application development and deployment complexity:** It's not clear
    to me that there is any significant difference between the two
    models in this regard.  Presumably the API exposed by the two
    different architectures would look very similar, as would the
    behavior of the deployed applications.  It has even been suggested
    to write the code in such a way that it could be run in either
    configuration.  It's not clear that this makes sense in practise
    though.
 1. **Control plane cost overhead:** There is a minimum per-cluster
   overhead -- two possibly virtual machines, or more for redundant HA
   deployments.  For deployments of very small Kubernetes
   clusters with the Decoupled Hierarchical approach, this cost can
   become significant.
 ### The Decoupled, Hierarchical Approach - Illustrated
 ![image](federation-high-level-arch.png)
 ## Cluster Federation API
 It is proposed that this look a lot like the existing Kubernetes API
 but be explicitly multi-cluster.
 + Clusters become first class objects, which can be registered,
   listed, described, deregistered etc via the API.
 + Compute resources can be explicitly requested in specific clusters,
   or automatically scheduled to the "best" cluster by the Cluster
   Federation control system (by a
   pluggable Policy Engine).
 + There is a federated equivalent of a replication controller type (or
   perhaps a [deployment](deployment.md)),
   which is multicluster-aware, and delegates to cluster-specific
   replication controllers/deployments as required (e.g. a federated RC for n
   replicas might simply spawn multiple replication controllers in
   different clusters to do the hard work).
 ## Policy Engine and Migration/Replication Controllers
 The Policy Engine decides which parts of each application go into each
 cluster at any point in time, and stores this desired state in the
 Desired Federation State store (an etcd or
 similar). Migration/Replication Controllers reconcile this against the
 desired states stored in the underlying Kubernetes clusters (by
 watching both, and creating or updating the underlying Replication
 Controllers and related Services accordingly).
 ## Authentication and Authorization
 This should ideally be delegated to some external auth system, shared
 by the underlying clusters, to avoid duplication and inconsistency.
 Either that, or we end up with multilevel auth.  Local readonly
 eventually consistent auth slaves in each cluster and in the Cluster
 Federation control system
 could potentially cache auth, to mitigate an SPOF auth system.
 ## Data consistency, failure and availability characteristics
 The services comprising the Cluster Federation control plane) have to run
   somewhere.  Several options exist here:
 * For high availability Cluster Federation deployments, these
   services may run in either:
  * a dedicated Kubernetes cluster, not co-located in the same
 	 availability zone with any of the federated clusters (for fault
 	 isolation reasons).  If that cluster/availability zone, and hence the Federation
 	 system, fails catastrophically, the underlying pods and
 	 applications continue to run correctly, albeit temporarily
 	 without the Federation system.
  * across multiple Kubernetes availability zones, probably with
       some sort of cross-AZ quorum-based store.  This provides
       theoretically higher availability, at the cost of some
       complexity related to data consistency across multiple
       availability zones.
  * For simpler, less highly available deployments, just co-locate the
     Federation control plane in/on/with one of the underlying
     Kubernetes clusters.  The downside of this approach is that if
     that specific cluster fails, all automated failover and scaling
     logic which relies on the federation system will also be
     unavailable at the same time (i.e. precisely when it is needed).
 	But if one of the other federated clusters fails, everything
     should work just fine.
 There is some further thinking to be done around the data consistency
    model upon which the Federation system is based, and it's impact
    on the detailed semantics, failure and availability
    characteristics of the system.
 ## Proposed Next Steps
 Identify concrete applications of each use case and configure a proof
 of concept service that exercises the use case.  For example, cluster
 failure tolerance seems popular, so set up an apache frontend with
 replicas in each of three availability zones with either an Amazon Elastic
 Load Balancer or Google Cloud Load Balancer pointing at them? What
 does the zookeeper config look like for N=3 across 3 AZs -- and how
 does each replica find the other replicas and how do clients find
 their primary zookeeper replica? And now how do I do a shared, highly
 available redis database?  Use a few common specific use cases like
 this to flesh out the detailed API and semantics of Cluster Federation.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federation.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/flannel-integration.md
+++ b/docs/proposals/flannel-integration.md
@ -1,132 +1 @@
-# Flannel integration with Kubernetes
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/flannel-integration.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/flannel-integration.md)
 ## Why?
 * Networking works out of the box.
 * Cloud gateway configuration is regulated by quota.
 * Consistent bare metal and cloud experience.
 * Lays foundation for integrating with networking backends and vendors.
 ## How?
 Thus:
 ```
 Master                      |               Node1
 ----------------------------------------------------------------------
 {192.168.0.0/16, 256 /24}   |               docker
    |                       |                 | restart with podcidr
 apiserver            <------------------    kubelet (sends podcidr)
    |                       |                 | here's podcidr, mtu
 flannel-server:10253 <------------------    flannel-daemon
 Allocates a /24      ------------------>    [config iptables, VXLan]
                     <------------------    [watch subnet leases]
 I just allocated     ------------------>    [config VXLan]
 another /24                 |
 ```
 ## Proposal
 Explaining vxlan is out of the scope of this document, however it does take some basic understanding to grok the proposal. Assume some pod wants to communicate across nodes with the above setup. Check the flannel vxlan devices:
 ```console
 node1 $ ip -d link show flannel.1
 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT
    link/ether a2:53:86:b5:5f:c1 brd ff:ff:ff:ff:ff:ff
    vxlan
 node1 $ ip -d link show eth0
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 42:01:0a:f0:00:04 brd ff:ff:ff:ff:ff:ff
 node2 $ ip -d link show flannel.1
 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT
    link/ether 56:71:35:66:4a:d8 brd ff:ff:ff:ff:ff:ff
    vxlan
 node2 $ ip -d link show eth0
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 42:01:0a:f0:00:03 brd ff:ff:ff:ff:ff:ff
 ```
 Note that we're ignoring cbr0 for the sake of simplicity. Spin-up a container on each node. We're using raw docker for this example only because we want control over where the container lands:
 ```
 node1 $ docker run -it radial/busyboxplus:curl /bin/sh
 [ root@5ca3c154cde3:/ ]$ ip addr show
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
 8: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue
    link/ether 02:42:12:10:20:03 brd ff:ff:ff:ff:ff:ff
    inet 192.168.32.3/24 scope global eth0
       valid_lft forever preferred_lft forever
 node2 $ docker run -it radial/busyboxplus:curl /bin/sh
 [ root@d8a879a29f5d:/ ]$ ip addr show
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
 16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue
    link/ether 02:42:12:10:0e:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.14.7/24 scope global eth0
       valid_lft forever preferred_lft forever
 [ root@d8a879a29f5d:/ ]$ ping 192.168.32.3
 PING 192.168.32.3 (192.168.32.3): 56 data bytes
 64 bytes from 192.168.32.3: seq=0 ttl=62 time=1.190 ms
 ```
 __What happened?__:
 From 1000 feet:
 * vxlan device driver starts up on node1 and creates a udp tunnel endpoint on 8472
 * container 192.168.32.3 pings 192.168.14.7
    - what's the MAC of 192.168.14.0?
        - L2 miss, flannel looks up MAC of subnet
        - Stores `192.168.14.0 <-> 56:71:35:66:4a:d8` in neighbor table
    - what's tunnel endpoint of this MAC?
        - L3 miss, flannel looks up destination VM ip
        - Stores `10.240.0.3 <-> 56:71:35:66:4a:d8` in bridge database
 * Sends `[56:71:35:66:4a:d8, 10.240.0.3][vxlan: port, vni][02:42:12:10:20:03, 192.168.14.7][icmp]`
 __But will it blend?__
 Kubernetes integration is fairly straight-forward once we understand the pieces involved, and can be prioritized as follows:
 * Kubelet understands flannel daemon in client mode, flannel server manages independent etcd store on master, node controller backs off CIDR allocation
 * Flannel server consults the Kubernetes master for everything network related
 * Flannel daemon works through network plugins in a generic way without bothering the kubelet: needs CNI x Kubernetes standardization
 The first is accomplished in this PR, while a timeline for 2. and 3. is TDB. To implement the flannel api we can either run a proxy per node and get rid of the flannel server, or service all requests in the flannel server with something like a go-routine per node:
 * `/network/config`: read network configuration and return
 * `/network/leases`:
 	- Post:  Return a lease as understood by flannel
 		- Lookip node by IP
 		- Store node metadata from [flannel request] (https://github.com/coreos/flannel/blob/master/subnet/subnet.go#L34) in annotations
 		- Return [Lease object] (https://github.com/coreos/flannel/blob/master/subnet/subnet.go#L40) reflecting node cidr
 	- Get: Handle a watch on leases
 * `/network/leases/subnet`:
 	- Put: This is a request for a lease. If the nodecontroller is allocating CIDRs we can probably just no-op.
 * `/network/reservations`: TDB, we can probably use this to accommodate node controller allocating CIDR instead of flannel requesting it
 The ick-iest part of this implementation is going to the `GET /network/leases`, i.e. the watch proxy. We can side-step by waiting for a more generic Kubernetes resource. However, we can also implement it as follows:
 * Watch all nodes, ignore heartbeats
 * On each change, figure out the lease for the node, construct a [lease watch result](https://github.com/coreos/flannel/blob/0bf263826eab1707be5262703a8092c7d15e0be4/subnet/subnet.go#L72), and send it down the watch with the RV from the node
 * Implement a lease list that does a similar translation
 I say this is gross without an api object because for each node->lease translation one has to store and retrieve the node metadata sent by flannel (eg: VTEP) from node annotations. [Reference implementation](https://github.com/bprashanth/kubernetes/blob/network_vxlan/pkg/kubelet/flannel_server.go) and [watch proxy](https://github.com/bprashanth/kubernetes/blob/network_vxlan/pkg/kubelet/watch_proxy.go).
 # Limitations
 * Integration is experimental
 * Flannel etcd not stored in persistent disk
 * CIDR allocation does *not* flow from Kubernetes down to nodes anymore
 # Wishlist
 This proposal is really just a call for community help in writing a Kubernetes x flannel backend.
 * CNI plugin integration
 * Flannel daemon in privileged pod
 * Flannel server talks to apiserver, described in proposal above
 * HTTPs between flannel daemon/server
 * Investigate flannel server running on every node (as done in the reference implementation mentioned above)
 * Use flannel reservation mode to support node controller podcidr allocation
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/flannel-integration.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/garbage-collection.md
+++ b/docs/proposals/garbage-collection.md
@ -1,357 +1 @@
-**Table of Contents**
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/garbage-collection.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/garbage-collection.md)
 - [Overview](#overview)
 - [Cascading deletion with Garbage Collector](#cascading-deletion-with-garbage-collector)
 - [Orphaning the descendants with "orphan" finalizer](#orphaning-the-descendants-with-orphan-finalizer)
  - [Part I. The finalizer framework](#part-i-the-finalizer-framework)
  - [Part II. The "orphan" finalizer](#part-ii-the-orphan-finalizer)
 - [Related issues](#related-issues)
  - [Orphan adoption](#orphan-adoption)
  - [Upgrading a cluster to support cascading deletion](#upgrading-a-cluster-to-support-cascading-deletion)
 - [End-to-End Examples](#end-to-end-examples)
  - [Life of a Deployment and its descendants](#life-of-a-deployment-and-its-descendants)
 - [Open Questions](#open-questions)
 - [Considered and Rejected Designs](#considered-and-rejected-designs)
 - [1. Tombstone + GC](#1-tombstone--gc)
 - [2. Recovering from abnormal cascading deletion](#2-recovering-from-abnormal-cascading-deletion)
 # Overview
 Currently most cascading deletion logic is implemented at client-side. For example, when deleting a replica set, kubectl uses a reaper to delete the created pods and then delete the replica set. We plan to move the cascading deletion to the server to simplify the client-side logic. In this proposal, we present the garbage collector which implements cascading deletion for all API resources in a generic way; we also present the finalizer framework, particularly the "orphan" finalizer, to enable flexible alternation between cascading deletion and orphaning.
 Goals of the design include:
 * Supporting cascading deletion at the server-side.
 * Centralizing the cascading deletion logic, rather than spreading in controllers.
 * Allowing optionally orphan the dependent objects.
 Non-goals include:
 * Releasing the name of an object immediately, so it can be reused ASAP.
 * Propagating the grace period in cascading deletion.
 # Cascading deletion with Garbage Collector
 ## API Changes
 ```
 type ObjectMeta struct {
 	...	
 	OwnerReferences []OwnerReference
 }
 ```
 **ObjectMeta.OwnerReferences**:
 List of objects depended by this object. If ***all*** objects in the list have been deleted, this object will be garbage collected. For example, a replica set `R` created by a deployment `D` should have an entry in ObjectMeta.OwnerReferences pointing to `D`, set by the deployment controller when `R` is created. This field can be updated by any client that has the privilege to both update ***and*** delete the object. For safety reasons, we can add validation rules to restrict what resources could be set as owners. For example, Events will likely be banned from being owners.
 ```
 type OwnerReference struct {
 	// Version of the referent.
 	APIVersion string
 	// Kind of the referent.
 	Kind string
 	// Name of the referent.
 	Name string
 	// UID of the referent.
 	UID types.UID
 }
 ```
 **OwnerReference struct**: OwnerReference contains enough information to let you identify an owning object. Please refer to the inline comments for the meaning of each field. Currently, an owning object must be in the same namespace as the dependent object, so there is no namespace field.
 ## New components: the Garbage Collector
 The Garbage Collector is responsible to delete an object if none of the owners listed in the object's OwnerReferences exist.
 The Garbage Collector consists of a scanner, a garbage processor, and a propagator.
 * Scanner:
  * Uses the discovery API to detect all the resources supported by the system.
  * Periodically scans all resources in the system and adds each object to the *Dirty Queue*.
 * Garbage Processor:
  * Consists of the *Dirty Queue* and workers.
  * Each worker:
    * Dequeues an item from *Dirty Queue*.
    * If the item's OwnerReferences is empty, continues to process the next item in the *Dirty Queue*.
    * Otherwise checks each entry in the OwnerReferences:
      * If at least one owner exists, do nothing.
      * If none of the owners exist, requests the API server to delete the item.
 * Propagator:
  * The Propagator is for optimization, not for correctness.
  * Consists of an *Event Queue*,  a single worker, and a DAG of owner-dependent relations.
    * The DAG stores only name/uid/orphan triplets, not the entire body of every item.
  * Watches for create/update/delete events for all resources, enqueues the events to the *Event Queue*.
  * Worker:
    * Dequeues an item from the *Event Queue*.
    * If the item is an creation or update, then updates the DAG accordingly.
      * If the object has an owner and the owner doesn’t exist in the DAG yet, then apart from adding the object to the DAG, also enqueues the object to the *Dirty Queue*.
    * If the item is a deletion, then removes the object from the DAG, and enqueues all its dependent objects to the *Dirty Queue*.
  * The propagator shouldn't need to do any RPCs, so a single worker should be sufficient. This makes locking easier.
  * With the Propagator, we *only* need to run the Scanner when starting the GC to populate the DAG and the *Dirty Queue*.
 # Orphaning the descendants with "orphan" finalizer
 Users may want to delete an owning object (e.g., a replicaset) while orphaning the dependent object (e.g., pods), that is, leaving the dependent objects untouched. We support such use cases by introducing the "orphan" finalizer. Finalizer is a generic API that has uses other than supporting orphaning, so we first describe the generic finalizer framework, then describe the specific design of the "orphan" finalizer.
 ## Part I. The finalizer framework
 ## API changes
 ```
 type ObjectMeta struct {
 	…
 	Finalizers []string
 }
 ```
 **ObjectMeta.Finalizers**: List of finalizers that need to run before deleting the object. This list must be empty before the object is deleted from the registry. Each string in the list is an identifier for the responsible component that will remove the entry from the list. If the deletionTimestamp of the object is non-nil, entries in this list can only be removed. For safety reasons, updating finalizers requires special privileges. To enforce the admission rules, we will expose finalizers as a subresource and disallow directly changing finalizers when updating the main resource.
 ## New components
 * Finalizers:
  * Like a controller, a finalizer is always running.
  * A third party can develop and run their own finalizer in the cluster. A finalizer doesn't need to be registered with the API server.
  * Watches for update events that meet two conditions:
    1. the updated object has the identifier of the finalizer in ObjectMeta.Finalizers;
    2. ObjectMeta.DeletionTimestamp is updated from nil to non-nil.
  * Applies the finalizing logic to the object in the update event.
  * After the finalizing logic is completed, removes itself from ObjectMeta.Finalizers.
  * The API server deletes the object after the last finalizer removes itself from the ObjectMeta.Finalizers field.
  * Because it's possible for the finalizing logic to be applied multiple times (e.g., the finalizer crashes after applying the finalizing logic but before being removed form ObjectMeta.Finalizers), the finalizing logic has to be idempotent.
  * If a finalizer fails to act in a timely manner, users with proper privileges can manually remove the finalizer from ObjectMeta.Finalizers. We will provide a kubectl command to do this.
 ## Changes to existing components
 * API server:
  * Deletion handler:
    * If the `ObjectMeta.Finalizers` of the object being deleted is non-empty, then updates the DeletionTimestamp, but does not delete the object.
    * If the `ObjectMeta.Finalizers` is empty and the options.GracePeriod is zero, then deletes the object. If the options.GracePeriod is non-zero, then just updates the DeletionTimestamp.
  * Update handler:
    * If the update removes the last finalizer, and the DeletionTimestamp is non-nil, and the DeletionGracePeriodSeconds is zero, then deletes the object from the registry.
    * If the update removes the last finalizer, and the DeletionTimestamp is non-nil, but the DeletionGracePeriodSeconds is non-zero, then just updates the object.
 ## Part II. The "orphan" finalizer
 ## API changes
 ```
 type DeleteOptions struct {
 	…
 	OrphanDependents bool
 }
 ```
 **DeleteOptions.OrphanDependents**: allows a user to express whether the dependent objects should be orphaned. It defaults to true, because controllers before release 1.2 expect dependent objects to be orphaned.
 ## Changes to existing components
 * API server:
 When handling a deletion request, depending on if DeleteOptions.OrphanDependents is true, the API server updates the object to add/remove the "orphan" finalizer to/from the ObjectMeta.Finalizers map.
 ## New components
 Adding a fourth component to the Garbage Collector, the"orphan" finalizer:
 * Watches for update events as described in [Part I](#part-i-the-finalizer-framework).
 * Removes the object in the event from the `OwnerReferences` of its dependents.
  * dependent objects can be found via the DAG kept by the GC, or by relisting the dependent resource and checking the OwnerReferences field of each potential dependent object.
 * Also removes any dangling owner references the dependent objects have.
 * At last, removes the itself from the `ObjectMeta.Finalizers` of the object.
 # Related issues
 ## Orphan adoption
 Controllers are responsible for adopting orphaned dependent resources. To do so, controllers
 * Checks a potential dependent object’s OwnerReferences to determine if it is orphaned.
 * Fills the OwnerReferences if the object matches the controller’s selector and is orphaned.
 There is a potential race between the "orphan" finalizer removing an owner reference and the controllers adding it back during adoption. Imagining this case: a user deletes an owning object and intends to orphan the dependent objects, so the GC removes the owner from the dependent object's OwnerReferences list, but the controller of the owner resource hasn't observed the deletion yet, so it adopts the dependent again and adds the reference back, resulting in the mistaken deletion of the dependent object. This race can be avoided by implementing Status.ObservedGeneration in all resources. Before updating the dependent Object's OwnerReferences, the "orphan" finalizer checks Status.ObservedGeneration of the owning object to ensure its controller has already observed the deletion.
 ## Upgrading a cluster to support cascading deletion
 For the master, after upgrading to a version that supports cascading deletion, the OwnerReferences of existing objects remain empty, so the controllers will regard them as orphaned and start the adoption procedures. After the adoptions are done, server-side cascading will be effective for these existing objects.
 For nodes, cascading deletion does not affect them.
 For kubectl, we will keep the kubectl’s cascading deletion logic for one more release.
 # End-to-End Examples
 This section presents an example of all components working together to enforce the cascading deletion or orphaning.
 ## Life of a Deployment and its descendants
 1. User creates a deployment `D1`.
 2. The Propagator of the GC observes the creation. It creates an entry of `D1` in the DAG.
 3. The deployment controller observes the creation of `D1`. It creates the replicaset `R1`, whose OwnerReferences field contains a reference to `D1`, and has the "orphan" finalizer in its ObjectMeta.Finalizers map.
 4. The Propagator of the GC observes the creation of `R1`. It creates an entry of `R1` in the DAG, with `D1` as its owner.
 5. The replicaset controller observes the creation of `R1` and creates Pods `P1`~`Pn`, all with `R1` in their OwnerReferences.
 6. The Propagator of the GC observes the creation of `P1`~`Pn`. It creates entries for them in the DAG, with `R1` as their owner.
  ***In case the user wants to cascadingly delete `D1`'s descendants, then***
 7. The user deletes the deployment `D1`, with `DeleteOptions.OrphanDependents=false`. API server checks if `D1` has "orphan" finalizer in its Finalizers map, if so, it updates `D1` to remove the "orphan" finalizer. Then API server deletes `D1`.
 8. The "orphan" finalizer does *not* take any action, because the observed deletion shows `D1` has an empty Finalizers map.
 9. The Propagator of the GC observes the deletion of `D1`. It deletes `D1` from the DAG. It adds its dependent object, replicaset `R1`, to the *dirty queue*.
 10. The Garbage Processor of the GC dequeues `R1` from the *dirty queue*. It finds `R1` has an owner reference pointing to `D1`, and `D1` no longer exists, so it requests API server to delete `R1`, with `DeleteOptions.OrphanDependents=false`. (The Garbage Processor should always set this field to false.)
 11. The API server updates `R1` to remove the "orphan" finalizer if it's in the `R1`'s Finalizers map. Then the API server deletes `R1`, as `R1` has an empty Finalizers map.
 12. The Propagator of the GC observes the deletion of `R1`. It deletes `R1` from the DAG. It adds its dependent objects, Pods `P1`~`Pn`, to the *Dirty Queue*.
 13. The Garbage Processor of the GC dequeues `Px` (1 <= x <= n) from the *Dirty Queue*. It finds that `Px` have an owner reference pointing to `D1`, and `D1` no longer exists, so it requests API server to delete `Px`, with `DeleteOptions.OrphanDependents=false`.
 14. API server deletes the Pods.
  ***In case the user wants to orphan `D1`'s descendants, then***
 7. The user deletes the deployment `D1`, with `DeleteOptions.OrphanDependents=true`.
 8. The API server first updates `D1`, with DeletionTimestamp=now and DeletionGracePeriodSeconds=0, increments the Generation by 1, and add the "orphan" finalizer to ObjectMeta.Finalizers if it's not present yet. The API server does not delete `D1`, because its Finalizers map is not empty.
 9. The deployment controller observes the update, and acknowledges by updating the `D1`'s ObservedGeneration. The deployment controller won't create more replicasets on `D1`'s behalf.
 10. The "orphan" finalizer observes the update, and notes down the Generation. It waits until the ObservedGeneration becomes equal to or greater than the noted Generation. Then it updates `R1` to remove `D1` from its OwnerReferences. At last, it updates `D1`, removing itself from `D1`'s Finalizers map.
 11. The API server handles the update of `D1`, because *i)* DeletionTimestamp is non-nil, *ii)*  the DeletionGracePeriodSeconds is zero, and *iii)* the last finalizer is removed from the Finalizers map, API server deletes `D1`.
 12. The Propagator of the GC observes the deletion of `D1`. It deletes `D1` from the DAG. It adds its dependent, replicaset `R1`, to the *Dirty Queue*.
 13. The Garbage Processor of the GC dequeues `R1` from the *Dirty Queue* and skips it, because its OwnerReferences is empty.
 # Open Questions
 1. In case an object has multiple owners, some owners are deleted with DeleteOptions.OrphanDependents=true, and some are deleted with DeleteOptions.OrphanDependents=false, what should happen to the object?
  The presented design will respect the setting in the deletion request of last owner.
 2. How to propagate the grace period in a cascading deletion? For example, when deleting a ReplicaSet with grace period of 5s, a user may expect the same grace period to be applied to the deletion of the Pods controlled the ReplicaSet.
  Propagating grace period in a cascading deletion is a ***non-goal*** of this proposal. Nevertheless, the presented design can be extended to support it. A tentative solution is letting the garbage collector to propagate the grace period when deleting dependent object. To persist the grace period set by the user, the owning object should not be deleted from the registry until all its dependent objects are in the graceful deletion state. This could be ensured by introducing another finalizer, tentatively named as the "populating graceful deletion" finalizer. Upon receiving the graceful deletion request, the API server adds this finalizer to the finalizers list of the owning object. Later the GC will remove it when all dependents are in the graceful deletion state.
  [#25055](https://github.com/kubernetes/kubernetes/issues/25055) tracks this problem.
 3. How can a client know when the cascading deletion is completed?
  A tentative solution is introducing a "completing cascading deletion" finalizer, which will be added to the finalizers list of the owning object, and removed by the GC when all dependents are deleted. The user can watch for the deletion event of the owning object to ensure the cascading deletion process has completed.
 ---
 ***THE REST IS FOR ARCHIVAL PURPOSES***
 ---
 # Considered and Rejected Designs
 # 1. Tombstone + GC
 ## Reasons of rejection
 * It likely would conflict with our plan in the future to use all resources as their own tombstones, once the registry supports multi-object transaction.
 * The TTL of the tombstone is hand-waving, there is no guarantee that the value of the TTL is long enough.
 * This design is essentially the same as the selected design, with the tombstone as an extra element. The benefit the extra complexity buys is that a parent object can be deleted immediately even if the user wants to orphan the children. The benefit doesn't justify the complexity.
 ## API Changes
 ```
 type DeleteOptions struct {
 	…
 	OrphanChildren bool
 }
 ```
 **DeleteOptions.OrphanChildren**: allows a user to express whether the child objects should be orphaned.
 ```
 type ObjectMeta struct {
 	...	
 	ParentReferences []ObjectReference
 }
 ```
 **ObjectMeta.ParentReferences**: links the resource to the parent resources. For example, a replica set `R` created by a deployment `D` should have an entry in ObjectMeta.ParentReferences pointing to `D`. The link should be set when the child object is created. It can be updated after the creation.
 ```
 type Tombstone struct {
    unversioned.TypeMeta
    ObjectMeta
 	UID types.UID
 }
 ```
 **Tombstone**: a tombstone is created when an object is deleted and the user requires the children to be orphaned.
 **Tombstone.UID**: the UID of the original object.
 ## New components
 The only new component is the Garbage Collector, which consists of a scanner, a garbage processor, and a propagator.
 * Scanner:
  * Uses the discovery API to detect all the resources supported by the system.
    * For performance reasons, resources can be marked as not participating cascading deletion in the discovery info, then the GC will not monitor them.
  * Periodically scans all resources in the system and adds each object to the *Dirty Queue*.
 * Garbage Processor:
  * Consists of the *Dirty Queue* and workers.
  * Each worker:
    * Dequeues an item from *Dirty Queue*.
    * If the item's ParentReferences is empty, continues to process the next item in the *Dirty Queue*.
    * Otherwise checks each entry in the ParentReferences:
      * If a parent exists, continues to check the next parent.
      * If a parent doesn't exist, checks if a tombstone standing for the parent exists.
    * If the step above shows no parent nor tombstone exists, requests the API server to delete the item. That is, only if ***all*** parents are non-existent, and none of them have tombstones, the child object will be garbage collected.
    * Otherwise removes the item's ParentReferences to non-existent parents.
 * Propagator:
  * The Propagator is for optimization, not for correctness.
  * Maintains a DAG of parent-child relations. This DAG stores only name/uid/orphan triplets, not the entire body of every item.
  * Consists of an *Event Queue* and a single worker.
  * Watches for create/update/delete events for all resources that participating cascading deletion, enqueues the events to the *Event Queue*.
  * Worker:
    * Dequeues an item from the *Event Queue*.
    * If the item is an creation or update, then updates the DAG accordingly.
      * If the object has a parent and the parent doesn’t exist in the DAG yet, then apart from adding the object to the DAG, also enqueues the object to the *Dirty Queue*.
    * If the item is a deletion, then removes the object from the DAG, and enqueues all its children to the *Dirty Queue*.
  * The propagator shouldn't need to do any RPCs, so a single worker should be sufficient. This makes locking easier.
  * With the Propagator, we *only* need to run the Scanner when starting the Propagator to populate the DAG and the *Dirty Queue*.
 ## Changes to existing components
 * Storage: we should add a REST storage for Tombstones. The index should be UID rather than namespace/name.
 * API Server: when handling a deletion request, if DeleteOptions.OrphanChildren is true, then the API Server either creates a tombstone with TTL if the tombstone doesn't exist yet, or updates the TTL of the existing tombstone. The API Server deletes the object after the tombstone is created.
 * Controllers: when creating child objects, controllers need to fill up their ObjectMeta.ParentReferences field. Objects that don’t have a parent should have the namespace object as the parent.
 ## Comparison with the selected design
 The main difference between the two designs is when to update the ParentReferences. In design #1, because a tombstone is created to indicate "orphaning" is desired, the updates to ParentReferences can be deferred until the deletion of the tombstone. In design #2, the updates need to be done before the parent object is deleted from the registry.
 * Advantages of "Tombstone + GC" design
  * Faster to free the resource name compared to using finalizers. The original object can be deleted to free the resource name once the tombstone is created, rather than waiting for the finalizers to update all children’s ObjectMeta.ParentReferences.
 * Advantages of "Finalizer Framework + GC"
  * The finalizer framework is needed for other purposes as well.
 # 2. Recovering from abnormal cascading deletion
 ## Reasons of rejection
 * Not a goal
 * Tons of work, not feasible in the near future
 In case the garbage collector is mistakenly deleting objects, we should provide mechanism to stop the garbage collector and restore the objects.
 * Stopping the garbage collector
  We will add a "--enable-garbage-collector" flag to the controller manager binary to indicate if the garbage collector should be enabled. Admin can stop the garbage collector in a running cluster by restarting the kube-controller-manager with --enable-garbage-collector=false.
 * Restoring mistakenly deleted objects
  * Guidelines
    * The restoration should be implemented as a roll-forward rather than a roll-back, because likely the state of the cluster (e.g., available resources on a node) has changed since the object was deleted.
    * Need to archive the complete specs of the deleted objects.
    * The content of the archive is sensitive, so the access to the archive subjects to the same authorization policy enforced on the original resource.
    * States should be stored in etcd. All components should remain stateless.
  * A preliminary design
    This is a generic design for “undoing a deletion”, not specific to undoing cascading deletion.
    * Add a `/archive` sub-resource to every resource, it's used to store the spec of the deleted objects.
    * Before an object is deleted from the registry, the API server clears fields like DeletionTimestamp, then creates the object in /archive and sets a TTL.
    * Add a `kubectl restore` command, which takes a resource/name pair as input, creates the object with the spec stored in the /archive, and deletes the archived object.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/garbage-collection.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/gpu-support.md
+++ b/docs/proposals/gpu-support.md
@ -1,279 +1 @@
-<!-- BEGIN MUNGE: GENERATED_TOC -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/gpu-support.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/gpu-support.md)
 - [GPU support](#gpu-support)
  - [Objective](#objective)
  - [Background](#background)
  - [Detailed discussion](#detailed-discussion)
    - [Inventory](#inventory)
    - [Scheduling](#scheduling)
    - [The runtime](#the-runtime)
      - [NVIDIA support](#nvidia-support)
    - [Event flow](#event-flow)
    - [Too complex for now: nvidia-docker](#too-complex-for-now-nvidia-docker)
  - [Implementation plan](#implementation-plan)
    - [V0](#v0)
      - [Scheduling](#scheduling-1)
      - [Runtime](#runtime)
      - [Other](#other)
  - [Future work](#future-work)
    - [V1](#v1)
    - [V2](#v2)
    - [V3](#v3)
    - [Undetermined](#undetermined)
  - [Security considerations](#security-considerations)
 <!-- END MUNGE: GENERATED_TOC -->
 # GPU support
 Author: @therc
 Date: Apr 2016
 Status: Design in progress, early implementation of requirements
 ## Objective
 Users should be able to request GPU resources for their workloads, as easily as
 for CPU or memory. Kubernetes should keep an inventory of machines with GPU
 hardware, schedule containers on appropriate nodes and set up the container
 environment with all that's necessary to access the GPU. All of this should
 eventually be supported for clusters on either bare metal or cloud providers.
 ## Background
 An increasing number of workloads, such as machine learning and seismic survey
 processing, benefits from offloading computations to graphic hardware. While not
 as tuned as traditional, dedicated high performance computing systems such as
 MPI, a Kubernetes cluster can still be a great environment for organizations
 that need a variety of additional, "classic" workloads, such as database, web
 serving, etc.
 GPU support is hard to provide extensively and will thus take time to tame
 completely, because
 - different vendors expose the hardware to users in different ways
 - some vendors require fairly tight coupling between the kernel driver
 controlling the GPU and the libraries/applications that access the hardware
 - it adds more resource types (whole GPUs, GPU cores, GPU memory)
 - it can introduce new security pitfalls
 - for systems with multiple GPUs, affinity matters, similarly to NUMA
 considerations for CPUs
 - running GPU code in containers is still a relatively novel idea
 ## Detailed discussion
 Currently, this document is mostly focused on the basic use case: run GPU code
 on AWS `g2.2xlarge` EC2 machine instances using Docker. It constitutes a narrow
 enough scenario that it does not require large amounts of generic code yet. GCE
 doesn't support GPUs at all; bare metal systems throw a lot of extra variables
 into the mix.
 Later sections will outline future work to support a broader set of hardware,
 environments and container runtimes.
 ### Inventory
 Before any scheduling can occur, we need to know what's available out there. In
 v0, we'll hardcode capacity detected by the kubelet based on a flag,
 `--experimental-nvidia-gpu`. This will result in the user-defined resource
 `alpha.kubernetes.io/nvidia-gpu` to be reported for `NodeCapacity` and
 `NodeAllocatable`, as well as as a node label.
 ### Scheduling
 GPUs will be visible as first-class resources. In v0, we'll only assign whole
 devices; sharing among multiple pods is left to future implementations. It's
 probable that GPUs will exacerbate the need for [a rescheduler](rescheduler.md)
 or pod priorities, especially if the nodes in a cluster are not homogeneous.
 Consider these two cases:
 > Only half of the machines have a GPU and they're all busy with other
 workloads. The other half of the cluster is doing very little work. A GPU
 workload comes, but it can't schedule, because the devices are sitting idle on
 nodes that are running something else and the nodes with little load lack the
 hardware.
 > Some or all the machines have two graphic cards each. A number of jobs get
 scheduled, requesting one device per pod. The scheduler puts them all on
 different machines, spreading the load, perhaps by design. Then a new job comes
 in, requiring two devices per pod, but it can't schedule anywhere, because all
 we can find, at most, is one unused device per node.
 ### The runtime
 Once we know where to run the container, it's time to set up its environment. At
 a minimum, we'll need to map the host device(s) into the container. Because each
 manufacturer exposes different device nodes (`/dev/ati/card0`, `/dev/nvidia0`,
 but also the required `/dev/nvidiactl` and `/dev/nvidia-uvm`), some of the logic
 needs to be hardware-specific, mapping from a logical device to a list of device
 nodes necessary for software to talk to it.
 Support binaries and libraries are often versioned along with the kernel module,
 so there should be further hooks to project those under `/bin` and some kind of
 `/lib` before the application is started. This can be done for Docker with the
 use of a versioned [Docker
 volume](https://docs.docker.com/engine/tutorials/dockervolumes/) or
 with upcoming Kubernetes-specific hooks such as init containers and volume
 containers. In v0, images are expected to bundle everything they need.
 #### NVIDIA support
 The first implementation and testing ground will be for NVIDIA devices, by far
 the most common setup.
 In v0, the `--experimental-nvidia-gpu` flag will also result in the host devices
 (limited to those required to drive the first card, `nvidia0`) to be mapped into
 the container by the dockertools library.
 ### Event flow
 This is what happens before and after an user schedules a GPU pod.
 1. Administrator installs a number of Kubernetes nodes with GPUs. The correct
 kernel modules and device nodes under `/dev/` are present.
 1. Administrator makes sure the latest CUDA/driver versions are installed.
 1. Administrator enables `--experimental-nvidia-gpu` on kubelets
 1. Kubelets update node status with information about the GPU device, in addition
 to cAdvisor's usual data about CPU/memory/disk
 1. User creates a Docker image compiling their application for CUDA, bundling
 the necessary libraries. We ignore any versioning requirements in the image
 using labels based on [NVIDIA's
 conventions](https://github.com/NVIDIA/nvidia-docker/blob/64510511e3fd0d00168eb076623854b0fcf1507d/tools/src/nvidia-docker/utils.go#L13).
 1. User creates a pod using the image, requiring
 `alpha.kubernetes.io/nvidia-gpu: 1`
 1. Scheduler picks a node for the pod
 1. The kubelet notices the GPU requirement and maps the three devices. In
 Docker's engine-api, this means it'll add them to the Resources.Devices list.
 1. Docker runs the container to completion
 1. The scheduler notices that the device is available again
 ### Too complex for now: nvidia-docker
 For v0, we discussed at length, but decided to leave aside initially the
 [nvidia-docker plugin](https://github.com/NVIDIA/nvidia-docker). The plugin is
 an officially supported solution, thus avoiding a lot of new low level code, as
 it takes care of functionality such as:
 - creating a Docker volume with binaries such as `nvidia-smi` and shared
 libraries
 - providing HTTP endpoints that monitoring tools can use to collect GPU metrics
 - abstracting details such as `/dev` entry names for each device, as well as
 control ones like `nvidiactl`
 The `nvidia-docker` wrapper also verifies that the CUDA version required by a
 given image is supported by the host drivers, through inspection of well-known
 image labels, if present. We should try to provide equivalent checks, either
 for CUDA or OpenCL.
 This is current sample output from `nvidia-docker-plugin`, wrapped for
 readability:
    $ curl -s localhost:3476/docker/cli
    --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0
    --volume-driver=nvidia-docker
    --volume=nvidia_driver_352.68:/usr/local/nvidia:ro
 It runs as a daemon listening for HTTP requests on port 3476. The endpoint above
 returns flags that need to be added to the Docker command line in order to
 expose GPUs to the containers. There are optional URL arguments to request
 specific devices if more than one are present on the system, as well as specific
 versions of the support software. An obvious improvement is an additional
 endpoint for JSON output.
 The unresolved question is whether `nvidia-docker-plugin` would run standalone
 as it does today (called over HTTP, perhaps with endpoints for a new Kubernetes
 resource API) or whether the relevant code from its `nvidia` package should be
 linked directly into kubelet. A partial list of tradeoffs:
 |                     | External binary                                                                                   | Linked in                                                    |
 |---------------------|---------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
 | Use of cgo          | Confined to binary                                                                                | Linked into kubelet, but with lazy binding                   |
 | Expandibility       | Limited if we run the plugin, increased if library is used to build a Kubernetes-tailored daemon. | Can reuse the `nvidia` library as we prefer                  |
 | Bloat               | None                                                                                              | Larger kubelet, even for systems without GPUs                |
 | Reliability         | Need to handle the binary disappearing at any time                                                | Fewer headeaches                                             |
 | (Un)Marshalling     | Need to talk over JSON                                                                            | None                                                         |
 | Administration cost | One more daemon to install, configure and monitor                                                 | No extra work required, other than perhaps configuring flags |
 | Releases            | Potentially on its own schedule                                                                   | Tied to Kubernetes'                                          |
 ## Implementation plan
 ### V0
 The first two tracks can progress in parallel.
 #### Scheduling
 1. Define new resource `alpha.kubernetes.io:nvidia-gpu` in `pkg/api/types.go`
 and co.
 1. Plug resource into feasability checks used by kubelet, scheduler and
 schedulercache. Maybe gated behind a flag?
 1. Plug resource into resource_helpers.go
 1. Plug resource into the limitranger
 #### Runtime
 1. Add kubelet config parameter to enable the resource
 1. Make kubelet's `setNodeStatusMachineInfo` report the resource
 1. Add a Devices list to container.RunContainerOptions
 1. Use it from DockerManager's runContainer
 1. Do the same for rkt (stretch goal)
 1. When a pod requests a GPU, add the devices to the container options
 #### Other
 1. Add new resource to `kubectl describe` output. Optional for non-GPU users?
 1. Administrator documentation, with sample scripts
 1. User documentation
 ## Future work
 Above all, we need to collect feedback from real users and use that to set
 priorities for any of the items below.
 ### V1
 - Perform real detection of the installed hardware
 - Figure a standard way to avoid bundling of shared libraries in images
 - Support fractional resources so multiple pods can share the same GPU
 - Support bare metal setups
 - Report resource usage
 ### V2
 - Support multiple GPUs with resource hierarchies and affinities
 - Support versioning of resources (e.g. "CUDA v7.5+")
 - Build resource plugins into the kubelet?
 - Support other device vendors
 - Support Azure?
 - Support rkt?
 ### V3
 - Support OpenCL (so images can be device-agnostic)
 ### Undetermined
 It makes sense to turn the output of this project (external resource plugins,
 etc.) into a more generic abstraction at some point.
 ## Security considerations
 There should be knobs for the cluster administrator to only allow certain users
 or roles to schedule GPU workloads. Overcommitting or sharing the same device
 across different pods is not considered safe. It should be possible to segregate
 such GPU-sharing pods by user, namespace or a combination thereof.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/gpu-support.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/high-availability.md
+++ b/docs/proposals/high-availability.md
@ -1,8 +1 @@
-# High Availability of Scheduling and Controller Components in Kubernetes
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/high-availability.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/high-availability.md)
 This document is deprecated. For more details about running a highly available
 cluster master, please see the [admin instructions document](../../docs/admin/high-availability.md).
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/high-availability.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/image-provenance.md
+++ b/docs/proposals/image-provenance.md
@ -1,331 +1 @@
-
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/image-provenance.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/image-provenance.md)
 # Overview
 Organizations wish to avoid running "unapproved" images.
 The exact nature of "approval" is beyond the scope of Kubernetes, but may include reasons like:
 - only run images that are scanned to confirm they do not contain vulnerabilities
 - only run images that use a "required" base image
 - only run images that contain binaries which were built from peer reviewed, checked-in source
   by a trusted compiler toolchain.
 - only allow images signed by certain public keys.
 - etc...
 Goals of the design include:
 * Block creation of pods that would cause "unapproved" images to run.
 * Make it easy for users or partners to build "image provenance checkers" which check whether images are "approved".
  *  We expect there will be multiple implementations.
 * Allow users to request an "override" of the policy in a convenient way (subject to the override being allowed).
  * "overrides" are needed to allow "emergency changes", but need to not happen accidentally, since they may
    require tedious after-the-fact justification and affect audit controls.
 Non-goals include:
 * Encoding image policy into Kubernetes code.
 * Implementing objects in core kubernetes which describe complete policies for what images are approved.
  * A third-party implementation of an image policy checker could optionally use ThirdPartyResource to store its policy.
 * Kubernetes core code dealing with concepts of image layers, build processes, source repositories, etc.
  * We expect there will be multiple PaaSes and/or de-facto programming environments, each with different takes on
    these concepts.  At any rate, Kubernetes is not ready to be opinionated on these concepts.
 * Sending more information than strictly needed to a third-party service.
  * Information sent by Kubernetes to a third-party service constitutes an API of Kubernetes, and we want to
    avoid making these broader than necessary, as it restricts future evolution of Kubernetes, and makes
    Kubernetes harder to reason about.  Also, excessive information limits cache-ability of decisions.  Caching
    reduces latency and allows short outages of the backend to be tolerated.
 Detailed discussion in [Ensuring only images are from approved sources are run](
 https://github.com/kubernetes/kubernetes/issues/22888).
 # Implementation
 A new admission controller will be added.  That will be the only change.
 ## Admission controller
 An `ImagePolicyWebhook` admission controller will be written.  The admission controller examines all pod objects which are
 created or updated.  It can either admit the pod, or reject it.  If it is rejected, the request sees a `403 FORBIDDEN`
 The admission controller code will go in `plugin/pkg/admission/imagepolicy`.
 There will be a cache of decisions in the admission controller.
 If the apiserver cannot reach the webhook backend, it will log a warning and either admit or deny the pod.
 A flag will control whether it admits or denies on failure.
 The rationale for deny is that an attacker could DoS the backend or wait for it to be down, and then sneak a
 bad pod into the system.  The rationale for allow here is that, if the cluster admin also does
 after-the-fact auditing of what images were run (which we think will be common), this will catch
 any bad images run during periods of backend failure.  With default-allow, the availability of Kubernetes does
 not depend on the availability of the backend.
 # Webhook Backend
 The admission controller code in that directory does not contain logic to make an admit/reject decision.  Instead, it extracts
 relevant fields from the Pod creation/update request and sends those fields to a Backend (which we have been loosely calling "WebHooks"
 in Kubernetes).  The request the admission controller sends to the backend is called a WebHook request to distinguish it from the
 request being admission-controlled.  The server that accepts the WebHook request from Kubernetes is called the "Backend"
 to distinguish it from the WebHook request itself, and from the API server.
 The whole system will work similarly to the [Authentication WebHook](
 https://github.com/kubernetes/kubernetes/pull/24902
 ) or the [AuthorizationWebHook](
 https://github.com/kubernetes/kubernetes/pull/20347).
 The WebHook request can optionally authenticate itself to its backend using a token from a `kubeconfig` file.
 The WebHook request and response are JSON, and correspond to the following `go` structures:
 ```go
 // Filename: pkg/apis/imagepolicy.k8s.io/register.go
 package imagepolicy
 // ImageReview checks if the set of images in a pod are allowed.
 type ImageReview struct {
 	unversioned.TypeMeta
 	// Spec holds information about the pod being evaluated
 	Spec ImageReviewSpec
 	// Status is filled in by the backend and indicates whether the pod should be allowed.
 	Status ImageReviewStatus
 }
 // ImageReviewSpec is a description of the pod creation request.
 type ImageReviewSpec struct {
 	// Containers is a list of a subset of the information in each container of the Pod being created.
 	Containers []ImageReviewContainerSpec
 	// Annotations is a list of key-value pairs extracted from the Pod's annotations.
 	// It only includes keys which match the pattern `*.image-policy.k8s.io/*`.
 	// It is up to each webhook backend to determine how to interpret these annotations, if at all.
 	Annotations map[string]string
 	// Namespace is the namespace the pod is being created in.
 	Namespace string
 }
 // ImageReviewContainerSpec is a description of a container within the pod creation request.
 type ImageReviewContainerSpec struct {
 	Image string
 	// In future, we may add command line overrides, exec health check command lines, and so on.
 }
 // ImageReviewStatus is the result of the token authentication request.
 type ImageReviewStatus struct {
 	// Allowed indicates that all images were allowed to be run.
 	Allowed bool
 	// Reason should be empty unless Allowed is false in which case it
 	// may contain a short description of what is wrong.  Kubernetes
 	// may truncate excessively long errors when displaying to the user.
 	Reason string
 }
 ```
 ## Extending with Annotations
 All annotations on a Pod that match `*.image-policy.k8s.io/*` are sent to the webhook.
 Sending annotations allows users who are aware of the image policy backend to send
 extra information to it, and for different backends implementations to accept
 different information.
 Examples of information you might put here are
 - request to "break glass" to override a policy, in case of emergency.
 - a ticket number from a ticket system that documents the break-glass request
 - provide a hint to the policy server as to the imageID of the image being provided, to save it a lookup
 In any case, the annotations are provided by the user and are not validated by Kubernetes in any way.  In the future, if an annotation is determined to be widely
 useful, we may promote it to a named field of ImageReviewSpec.
 In the case of a Pod update, Kubernetes may send the backend either all images in the updated image, or only the ones that
 changed, at its discretion.
 ## Interaction with Controllers
 In the case of a Deployment object, no image check is done when the Deployment object is created or updated.
 Likewise, no check happens when the Deployment controller creates a ReplicaSet.  The check only happens
 when the ReplicaSet controller creates a Pod.  Checking Pod is necessary since users can directly create pods,
 and since third-parties can write their own controllers, which kubernetes might not be aware of or even contain
 pod templates.
 The ReplicaSet, or other controller, is responsible for recognizing when a 403 has happened
 (whether due to user not having permission due to bad image, or some other permission reason)
 and throttling itself and surfacing the error in a way that CLIs and UIs can show to the user.
 Issue [22298](https://github.com/kubernetes/kubernetes/issues/22298) needs to be resolved to
 propagate Pod creation errors up through a stack of controllers.
 ## Changes in policy over time
 The Backend might change the policy over time.  For example, yesterday `redis:v1` was allowed, but today `redis:v1` is not allowed
 due to a CVE that just came out (fictional scenario).  In this scenario:
 .
 - a newly created replicaSet will be unable to create Pods.
 - updating a deployment will be safe in the sense that it will detect that the new ReplicaSet is not scaling
  up and not scale down the old one.
 - an existing replicaSet will be unable to create Pods that replace ones which are terminated.  If this is due to
  slow loss of nodes, then there should be time to react before significant loss of capacity.
 - For non-replicated things (size 1 ReplicaSet, StatefulSet), a single node failure may disable it.
 - a node rolling update will eventually check for liveness of replacements, and would be throttled if
  in the case when the image was no longer allowed and so replacements could not be started.
 - rapid node restarts will cause existing pod objects to be restarted by kubelet.
 - slow node restarts or network partitions will cause node controller to delete pods and there will be no replacement
 It is up to the Backend implementor, and the cluster administrator who decides to use that backend, to decide
 whether the Backend should be allowed to change its mind.  There is a tradeoff between responsiveness
 to changes in policy, versus keeping existing services running.  The two models that make sense are:
 - never change a policy, unless some external process has ensured no active objects depend on the to-be-forbidden
  images.
 - change a policy and assume that transition to new image happens faster than the existing pods decay.
 ## Ubernetes
 If two clusters share an image policy backend, then they will have the same policies.
 The clusters can pass different tokens to the backend, and the backend can use this to distinguish
 between different clusters.
 ## Image tags and IDs
 Image tags are like: `myrepo/myimage:v1`.
 Image IDs are like: `myrepo/myimage@sha256:beb6bd6a68f114c1dc2ea4b28db81bdf91de202a9014972bec5e4d9171d90ed`.
 You can see image IDs with `docker images --no-trunc`.
 The Backend needs to be able to resolve tags to IDs (by talking to the images repo).
 If the Backend resolves tags to IDs, there is some risk that the tag-to-ID mapping will be
 modified after approval by the Backend, but before Kubelet pulls the image.  We will not address this
 race condition at this time.
 We will wait and see how much demand there is for closing this hole. If the community demands a solution,
 we may suggest one of these:
 1.  Use a backend that refuses to accept images that are specified with tags, and require users to resolve to IDs
    prior to creating a pod template.
   - [kubectl could be modified to automate this process](https://github.com/kubernetes/kubernetes/issues/1697)
   - a CI/CD system or templating system could be used that maps IDs to tags before Deployment modification/creation.
 1. Audit logs from kubelets to see image IDs were actually run, to see if any unapproved images slipped through.
 1. Monitor tag changes in image repository for suspicious activity, or restrict remapping of tags after initial application.
 If none of these works well, we could do the following:
 - Image Policy Admission Controller adds new field to Pod, e.g. `pod.spec.container[i].imageID` (or an annotation).
  and kubelet will enforce that both the imageID and image match the image pulled.
 Since this adds complexity and interacts with imagePullPolicy, we avoid adding the above feature initially.
 ### Caching
 There will be a cache of decisions in the admission controller.
 TTL will be user-controllable, but default to 1 hour for allows and 30s for denies.
 Low TTL for deny allows user to correct a setting on the backend and see the fix
 rapidly.  It is assumed that denies are infrequent.
 Caching allows permits RC to scale up services even during short unavailability of the webhook backend.
 The ImageReviewSpec is used as the key to the cache.
 In the case of a cache miss and timeout talking to the backend, the default is to allow Pod creation.
 Keeping services running is more important than a hypothetical threat from an un-verified image.
 ### Post-pod-creation audit
 There are several cases where an image not currently allowed might still run.  Users wanting a
 complete audit solution are advised to also do after-the-fact auditing of what images
 ran.  This can catch:
 - images allowed due to backend not reachable
 - images that kept running after policy change (e.g. CVE discovered)
 - images started via local files or http option of kubelet
 - checking SHA of images allowed by a tag which was remapped
 This proposal does not include post-pod-creation audit.
 ## Alternatives considered
 ### Admission Control on Controller Objects
 We could have done admission control on Deployments, Jobs, ReplicationControllers, and anything else that creates a Pod, directly or indirectly.
 This approach is good because it provides immediate feedback to the user that the image is not allowed.  However, we do not expect disallowed images
 to be used often.  And controllers need to be able to surface problems creating pods for a variety of other reasons anyways.
 Other good things about this alternative are:
 - Fewer calls to Backend, once per controller rather than once per pod creation. Caching in backend should be able to help with this, though.
 - End user that created the object is seen, rather than the user of the controller process.  This can be fixed by implementing `Impersonate-User` for controllers.
 Other problems are:
 - Works only with "core" controllers.  Need to update admission controller if we add more "core" controllers.  Won't work with "third party controllers", e.g. how we open-source distributed systems like hadoop, spark, zookeeper, etc running on kubernetes.  Because those controllers don't have config that can be "admission controlled", or if they do, schema is not known to admission controller, have to "search" for pod templates in json.  Yuck.
 - How would it work if user created pod directly, which is allowed, and the recommended way to run something at most once.
 ### Sending User to Backend
 We could have sent the username of the pod creator to the backend.  The username could be used to allow different users to run
 different categories of images.  This would require propagating the username from e.g. Deployment creation, through to
 Pod creation via, e.g. the `Impersonate-User:` header.  This feature is [not ready](https://github.com/kubernetes/kubernetes/issues/27152).
 When it is, we will re-evaluate adding user as a field of `ImagePolicyRequest`.
 ### Enforcement at Docker level
 Docker supports plugins which can check any container creation before it happens.  For example the [twistlock/authz](https://github.com/twistlock/authz)
 Docker plugin can audit the full request sent to the Docker daemon and approve or deny it.  This could include checking if the image is allowed.
 We reject this option because:
 - it requires all nodes to be able to configured with how to reach the Backend, which complicates node setup.
 - it may not work with other runtimes
 - propagating error messages back to the user is more difficult
 - it requires plumbing additional information about requests to nodes (if we later want to consider `User` in policy).
 ### Policy Stored in API
 We decided to store policy about what SecurityContexts a pod can have in the API, via PodSecurityPolicy.
 This is because Pods are a Kubernetes object, and the Policy is very closely tied to the definition of Pods,
 and grows in step as the Pods API grows.
 For Image policy, the connection is not as strong.  To Kubernetes API, and Image is just a string, and it
 does not know any of the image metadata, which lives outside the API.
 Image policy may depend on the Dockerfile, the source code, the source repo, the source review tools,
 vulnerability databases, and so on.  Kubernetes does not have these as built-in concepts or have plans to add
 them anytime soon.
 ### Registry whitelist/blacklist
 We considered a whitelist/blacklist of registries and/or repositories. Basically, a prefix match on image strings.
 The problem of approving images would be then pushed to a problem of controlling who has access to push to a
 trusted registry/repository.  That approach is simple for kubernetes.  Problems with it are:
 - tricky to allow users to share a repository but have different image policies per user or per namespace.
 - tricky to do things after image push, such as scan image for vulnerabilities (such as Docker Nautilus), and have those results considered by policy
 - tricky to block "older" versions from running, whose interaction with current system may not be well understood.
 - how to allow emergency override?
 - hard to change policy decision over time.
 We still want to use rkt trust, docker content trust, etc for any registries used. We just need additional
 image policy checks beyond what trust can provide.
 ### Send every Request to a Generic Admission Control Backend
 Instead of just sending a subset of PodSpec to an Image Provenance backed, we could have sent every object
 that is created or updated (or deleted?) to one or ore Generic Admission Control Backends.
 This might be a good idea, but needs quite a bit more thought.  Some questions with that approach are:
 It will not be a generic webhook. A generic webhook would need a lot more discussion:
 - a generic webhook needs to touch all objects, not just pods. So it won't have a fixed schema.  How to express this in our IDL?  Harder to write clients
  that interpret unstructured data rather than a fixed schema.  Harder to version, and to detect errors.
 - a generic webhook client needs to ignore kinds it does not care about, or the apiserver needs to know which backends care about which kinds.  How
  to specify which backends see which requests.  Sending all requests including high-rate requests like events and pod-status updated, might be
  too high a rate for some backends?
 Additionally, just sending all the fields of just the Pod kind also has problems:
 - it exposes our whole API to a webhook backend without giving us (the project) any chance to review or understand how it is being used.
 - because we do not know which fields of an object are inspected by the backend, caching of decisions is not effective. Sending fewer fields allows caching.
 - sending fewer fields makes it possible to rev the version of the webhook request slower than the version of our internal obejcts (e.g. pod v2 could still use imageReview v1.)
 probably lots more reasons.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/image-provenance.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/initial-resources.md
+++ b/docs/proposals/initial-resources.md
@ -1,75 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/initial-resources.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/initial-resources.md)
 Initial Resources is a data-driven feature that based on historical data tries to estimate resource usage of a container without Resources specified
 and set them before the container is run. This document describes design of the component.
 ## Motivation
 Since we want to make Kubernetes as simple as possible for its users we don’t want to require setting [Resources](../design/resource-qos.md) for container by its owner.
 On the other hand having Resources filled is critical for scheduling decisions.
 Current solution to set up Resources to hardcoded value has obvious drawbacks.
 We need to implement a component which will set initial Resources to a reasonable value.
 ## Design
 InitialResources component will be implemented as an [admission plugin](../../plugin/pkg/admission/) and invoked right before
 [LimitRanger](https://github.com/kubernetes/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/cluster/gce/config-default.sh#L91).
 For every container without Resources specified it will try to predict amount of resources that should be sufficient for it.
 So that a pod without specified resources will be treated as
 .
 InitialResources will set only [request](../design/resource-qos.md#requests-and-limits) (independently for each resource type: cpu, memory) field in the first version to avoid killing containers due to OOM (however the container still may be killed if exceeds requested resources).
 To make the component work with LimitRanger the estimated value will be capped by min and max possible values if defined.
 It will prevent from situation when the pod is rejected due to too low or too high estimation.
 The container won’t be marked as managed by this component in any way, however appropriate event will be exported.
 The predicting algorithm should have very low latency to not increase significantly e2e pod startup latency
 [#3954](https://github.com/kubernetes/kubernetes/pull/3954).
 ### Predicting algorithm details
 In the first version estimation will be made based on historical data for the Docker image being run in the container (both the name and the tag matters).
 CPU/memory usage of each container is exported periodically (by default with 1 minute resolution) to the backend (see more in [Monitoring pipeline](#monitoring-pipeline)).
 InitialResources will set Request for both cpu/mem as the 90th percentile of the first (in the following order) set of samples defined in the following way:
 * 7 days same image:tag, assuming there is at least 60 samples (1 hour)
 * 30 days same image:tag, assuming there is at least 60 samples (1 hour)
 * 30 days same image, assuming there is at least 1 sample
 If there is still no data the default value will be set by LimitRanger. Same parameters will be configurable with appropriate flags.
 #### Example
 If we have at least 60 samples from image:tag over the past 7 days, we will use the 90th percentile of all of the samples of image:tag over the past 7 days.
 Otherwise, if we have at least 60 samples from image:tag over the past 30 days, we will use the 90th percentile of all of the samples over of image:tag the past 30 days.
 Otherwise, if we have at least 1 sample from image over the past 30 days, we will use that the 90th percentile of all of the samples of image over the past 30 days.
 Otherwise we will use default value.
 ### Monitoring pipeline
 In the first version there will be available 2 options for backend for predicting algorithm:
 * [InfluxDB](../../docs/user-guide/monitoring.md#influxdb-and-grafana) - aggregation will be made in SQL query
 * [GCM](../../docs/user-guide/monitoring.md#google-cloud-monitoring) - since GCM is not as powerful as InfluxDB some aggregation will be made on the client side
 Both will be hidden under an abstraction layer, so it would be easy to add another option.
 The code will be a part of Initial Resources component to not block development, however in the future it should be a part of Heapster.
 ## Next steps
 The first version will be quite simple so there is a lot of possible improvements. Some of them seem to have high priority
 and should be introduced shortly after the first version is done:
 * observe OOM and then react to it by increasing estimation
 * add possibility to specify if estimation should be made, possibly as ```InitialResourcesPolicy``` with options: *always*, *if-not-set*, *never*
 * add other features to the model like *namespace*
 * remember predefined values for the most popular images like *mysql*, *nginx*, *redis*, etc.
 * dry mode, which allows to ask system for resource recommendation for a container without running it
 * add estimation as annotations for those containers that already has resources set
 * support for other data sources like [Hawkular](http://www.hawkular.org/)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/initial-resources.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/job.md
+++ b/docs/proposals/job.md
@ -1,159 +1 @@
-# Job Controller
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/job.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/job.md)
 ## Abstract
 A proposal for implementing a new controller - Job controller - which will be responsible
 for managing pod(s) that require running once to completion even if the machine
 the pod is running on fails, in contrast to what ReplicationController currently offers.
 Several existing issues and PRs were already created regarding that particular subject:
 * Job Controller [#1624](https://github.com/kubernetes/kubernetes/issues/1624)
 * New Job resource [#7380](https://github.com/kubernetes/kubernetes/pull/7380)
 ## Use Cases
 1. Be able to start one or several pods tracked as a single entity.
 1. Be able to run batch-oriented workloads on Kubernetes.
 1. Be able to get the job status.
 1. Be able to specify the number of instances performing a job at any one time.
 1. Be able to specify the number of successfully finished instances required to finish a job.
 ## Motivation
 Jobs are needed for executing multi-pod computation to completion; a good example
 here would be the ability to implement any type of batch oriented tasks.
 ## Implementation
 Job controller is similar to replication controller in that they manage pods.
 This implies they will follow the same controller framework that replication
 controllers already defined.  The biggest difference between a `Job` and a
 `ReplicationController` object is the purpose; `ReplicationController`
 ensures that a specified number of Pods are running at any one time, whereas
 `Job` is responsible for keeping the desired number of Pods to a completion of
 a task.  This difference will be represented by the `RestartPolicy` which is
 required to always take value of `RestartPolicyNever` or `RestartOnFailure`.
 The new `Job` object will have the following content:
 ```go
 // Job represents the configuration of a single job.
 type Job struct {
    TypeMeta
    ObjectMeta
    // Spec is a structure defining the expected behavior of a job.
    Spec JobSpec
    // Status is a structure describing current status of a job.
    Status JobStatus
 }
 // JobList is a collection of jobs.
 type JobList struct {
    TypeMeta
    ListMeta
    Items []Job
 }
 ```
 `JobSpec` structure is defined to contain all the information how the actual job execution
 will look like.
 ```go
 // JobSpec describes how the job execution will look like.
 type JobSpec struct {
    // Parallelism specifies the maximum desired number of pods the job should
    // run at any given time. The actual number of pods running in steady state will
    // be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
    // i.e. when the work left to do is less than max parallelism.
    Parallelism *int
    // Completions specifies the desired number of successfully finished pods the
    // job should be run with. Defaults to 1.
    Completions *int
    // Selector is a label query over pods running a job.
    Selector map[string]string
    // Template is the object that describes the pod that will be created when
    // executing a job.
    Template *PodTemplateSpec
 }
 ```
 `JobStatus` structure is defined to contain information about pods executing
 specified job.  The structure holds information about pods currently executing
 the job.
 ```go
 // JobStatus represents the current state of a Job.
 type JobStatus struct {
    Conditions []JobCondition
    // CreationTime represents time when the job was created
    CreationTime unversioned.Time
    // StartTime represents time when the job was started
    StartTime unversioned.Time
    // CompletionTime represents time when the job was completed
    CompletionTime unversioned.Time
    // Active is the number of actively running pods.
    Active int
    // Successful is the number of pods successfully completed their job.
    Successful int
    // Unsuccessful is the number of pods failures, this applies only to jobs
    // created with RestartPolicyNever, otherwise this value will always be 0.
    Unsuccessful int
 }
 type JobConditionType string
 // These are valid conditions of a job.
 const (
    // JobComplete means the job has completed its execution.
    JobComplete JobConditionType = "Complete"
 )
 // JobCondition describes current state of a job.
 type JobCondition struct {
    Type               JobConditionType
    Status             ConditionStatus
    LastHeartbeatTime  unversioned.Time
    LastTransitionTime unversioned.Time
    Reason             string
    Message            string
 }
 ```
 ## Events
 Job controller will be emitting the following events:
 * JobStart
 * JobFinish
 ## Future evolution
 Below are the possible future extensions to the Job controller:
 * Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods. *now implemented*
 * Be able to create a chain of jobs dependent one on another. *will be implemented in a separate type called Workflow*
 * Be able to specify the work each of the workers should execute (see type 1 from
  [this comment](https://github.com/kubernetes/kubernetes/issues/1624#issuecomment-97622142))
 * Be able to inspect Pods running a Job, especially after a Job has finished, e.g.
  by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)).
 * help users avoid non-unique label selectors ([see this proposal](../../docs/design/selector-generation.md))
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/job.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubectl-login.md
+++ b/docs/proposals/kubectl-login.md
@ -1,220 +1 @@
-# Kubectl Login Subcommand
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubectl-login.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubectl-login.md)
 **Authors**: Eric Chiang (@ericchiang)
 ## Goals
 `kubectl login` is an entrypoint for any user attempting to connect to an
 existing server. It should provide a more tailored experience than the existing
 `kubectl config` including config validation, auth challenges, and discovery.
 Short term the subcommand should recognize and attempt to help:
 * New users with an empty configuration trying to connect to a server.
 * Users with no credentials, by prompt for any required information.
 * Fully configured users who want to validate credentials.
 * Users trying to switch servers.
 * Users trying to reauthenticate as the same user because credentials have expired.
 * Authenticate as a different user to the same server.
 Long term `kubectl login` should enable authentication strategies to be
 discoverable from a master to avoid the end-user having to know how their
 sysadmin configured the Kubernetes cluster.
 ## Design
 The "login" subcommand helps users move towards a fully functional kubeconfig by
 evaluating the current state of the kubeconfig and trying to prompt the user for
 and validate the necessary information to login to the kubernetes cluster.
 This is inspired by a similar tools such as:
 * [os login](https://docs.openshift.org/latest/cli_reference/get_started_cli.html#basic-setup-and-login)
 * [gcloud auth login](https://cloud.google.com/sdk/gcloud/reference/auth/login)
 * [aws configure](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html)
 The steps taken are:
 1. If no cluster configured, prompt user for cluster information.
 2. If no user is configured, discover the authentication strategies supported by the API server.
 3. Prompt the user for some information based on the authentication strategy they choose.
 4. Attempt to login as a user, including authentication challenges such as OAuth2 flows, and display user info.
 Importantly, each step is skipped if the existing configuration is validated or
 can be supplied without user interaction (refreshing an OAuth token, redeeming
 a Kerberos ticket, etc.). Users with fully configured kubeconfigs will only see
 the user they're logged in as, useful for opaque credentials such as X509 certs
 or bearer tokens.
 The command differs from `kubectl config` by:
 * Communicating with the API server to determine if the user is supplying valid auth events.
 * Validating input and being opinionated about the input it asks for.
 * Triggering authentication challenges for example:
  * Basic auth: Actually try to communicate with the API server.
  * OpenID Connect: Create an OAuth2 redirect.
 However `kubectl login` should still be seen as a supplement to, not a
 replacement for, `kubectl config` by helping validate any kubeconfig generated
 by the latter command.
 ## Credential validation
 When clusters utilize authorization plugins access decisions are based on the
 correct configuration of an auth-N plugin, an auth-Z plugin, and client side
 credentials. Being rejected then begs several questions. Is the user's
 kubeconfig misconfigured? Is the authorization plugin setup wrong? Is the user
 authenticating as a different user than the one they assume?
 To help `kubectl login` diagnose misconfigured credentials, responses from the
 API server to authenticated requests SHOULD include the `Authentication-Info`
 header as defined in [RFC 7615](https://tools.ietf.org/html/rfc7615). The value
 will hold name value pairs for `username` and `uid`. Since usernames and IDs
 can be arbitrary strings, these values will be escaped using the `quoted-string`
 format noted in the RFC.
 ```
 HTTP/1.1 200 OK
 Authentication-Info: username="janedoe@example.com", uid="123456"
 ```
 If the user successfully authenticates this header will be set, regardless of
 auth-Z decisions. For example a 401 Unauthorized (user didn't provide valid
 credentials) would lack this header, while a 403 Forbidden response would
 contain it.
 ## Authentication discovery
 A long term goal of `kubectl login` is to facilitate a customized experience
 for clusters configured with different auth providers. This will require some
 way for the API server to indicate to `kubectl` how a user is expected to
 login.
 Currently, this document doesn't propose a specific implementation for
 discovery. While it'd be preferable to utilize an existing standard (such as the
 `WWW-Authenticate` HTTP header), discovery may require a solution custom to the
 API server, such as an additional discovery endpoint with a custom type.
 ## Use in non-interactive session
 For the initial implementation, if `kubectl login` requires prompting and is
 called from a non-interactive session (determined by if the session is using a
 TTY) it errors out, recommending using `kubectl config` instead. In future
 updates `kubectl login` may include options for non-interactive sessions so
 auth strategies which require custom behavior not built into `kubectl config`,
 such as the exchanges in Kerberos or OpenID Connect, can be triggered from
 scripts.
 ## Examples
 If kubeconfig isn't configured, `kubectl login` will attempt to fully configure
 and validate the client's credentials.
 ```
 $ kubectl login
 Cluster URL []: https://172.17.4.99:443
 Cluster CA [(defaults to host certs)]: ${PWD}/ssl/ca.pem
 Cluster Name ["cluster-1"]:
 The kubernetes server supports the following methods:
  1. Bearer token
  2. Username and password
  3. Keystone
  4. OpenID Connect
  5. TLS client certificate
 Enter login method [1]: 4
 Logging in using OpenID Connect.
 Issuer ["valuefromdiscovery"]: https://accounts.google.com
 Issuer CA [(defaults to host certs)]:
 Scopes ["profile email"]:
 Client ID []: client@localhost:foobar
 Client Secret []: *****
 Open the following address in a browser.
    https://accounts.google.com/o/oauth2/v2/auth?redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scopes=openid%20email&access_type=offline&...
 Enter security code: ****
 Logged in as "janedoe@gmail.com"
 ```
 Human readable names are provided by a combination of the auth providers
 understood by `kubectl login` and the authenticator discovery. For instance,
 Keystone uses basic auth credentials in the same way as a static user file, but
 if the discovery indicates that the Keystone plugin is being used it should be
 presented to the user differently.
 Users with configured credentials will simply auth against the API server and see
 who they are. Running this command again simply validates the user's credentials.
 ```
 $ kubectl login
 Logged in as "janedoe@gmail.com"
 ```
 Users who are halfway through the flow will start where they left off. For
 instance if a user has configured the cluster field but on a user field, they will
 be prompted for credentials.
 ```
 $ kubectl login
 No auth type configured. The kubernetes server supports the following methods:
  1. Bearer token
  2. Username and password
  3. Keystone
  4. OpenID Connect
  5. TLS client certificate
 Enter login method [1]: 2
 Logging in with basic auth. Enter the following fields.
 Username: janedoe
 Password: ****
 Logged in as "janedoe@gmail.com"
 ```
 Users who wish to switch servers can provide the `--switch-cluster` flag which
 will prompt the user for new cluster details and switch the current context. It
 behaves identically to `kubectl login` when a cluster is not set.
 ```
 $ kubectl login --switch-cluster
 # ...
 ```
 Switching users goes through a similar flow attempting to prompt the user for
 new credentials to the same server.
 ```
 $ kubectl login --switch-user
 # ...
 ```
 ## Work to do
 Phase 1:
 * Provide a simple dialog for configuring authentication.
 * Kubectl can trigger authentication actions such as trigging OAuth2 redirects.
 * Validation of user credentials thought the `Authentication-Info` endpoint.
 Phase 2:
 * Update proposal with auth provider discovery mechanism.
 * Customize dialog using discovery data.
 Further improvements will require adding more authentication providers, and
 adapting existing plugins to take advantage of challenge based authentication.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubectl-login.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-auth.md
+++ b/docs/proposals/kubelet-auth.md
@ -1,106 +1 @@
-# Kubelet Authentication / Authorization
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-auth.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-auth.md)
 Author: Jordan Liggitt (jliggitt@redhat.com)
 ## Overview
 The kubelet exposes endpoints which give access to data of varying sensitivity,
 and allow performing operations of varying power on the node and within containers.
 There is no built-in way to limit or subdivide access to those endpoints,
 so deployers must secure the kubelet API using external, ad-hoc methods.
 This document proposes a method for authenticating and authorizing access
 to the kubelet API, using interfaces and methods that complement the existing
 authentication and authorization used by the API server.
 ## Preliminaries
 This proposal assumes the existence of:
 * a functioning API server
 * the SubjectAccessReview and TokenReview APIs
 It also assumes each node is additionally provisioned with the following information:
 1. Location of the API server
 2. Any CA certificates necessary to trust the API server's TLS certificate
 3. Client credentials authorized to make SubjectAccessReview and TokenReview API calls
 ## API Changes
 None
 ## Kubelet Authentication
 Enable starting the kubelet with one or more of the following authentication methods:
 * x509 client certificate
 * bearer token
 * anonymous (current default)
 For backwards compatibility, the default is to enable anonymous authentication.
 ### x509 client certificate
 Add a new `--client-ca-file=[file]` option to the kubelet.
 When started with this option, the kubelet authenticates incoming requests using x509
 client certificates, validated against the root certificates in the provided bundle.
 The kubelet will reuse the x509 authenticator already used by the API server.
 The master API server can already be started with `--kubelet-client-certificate` and
 `--kubelet-client-key` options in order to make authenticated requests to the kubelet.
 ### Bearer token
 Add a new `--authentication-token-webhook=[true|false]` option to the kubelet.
 When true, the kubelet authenticates incoming requests with bearer tokens by making
 `TokenReview` API calls to the API server.
 The kubelet will reuse the webhook authenticator already used by the API server, configured
 to call the API server using the connection information already provided to the kubelet.
 To improve performance of repeated requests with the same bearer token, the
 `--authentication-token-webhook-cache-ttl` option supported by the API server
 would be supported.
 ### Anonymous
 Add a new `--anonymous-auth=[true|false]` option to the kubelet.
 When true, requests to the secure port that are not rejected by other configured
 authentication methods are treated as anonymous requests, and given a username
 of `system:anonymous` and a group of `system:unauthenticated`.
 ## Kubelet Authorization
 Add a new `--authorization-mode` option to the kubelet, specifying one of the following modes:
 * `Webhook`
 * `AlwaysAllow` (current default)
 For backwards compatibility, the authorization mode defaults to `AlwaysAllow`.
 ### Webhook
 Webhook mode converts the request to authorization attributes, and makes a `SubjectAccessReview`
 API call to check if the authenticated subject is allowed to make a request with those attributes.
 This enables authorization policy to be centrally managed by the authorizer configured for the API server.
 The kubelet will reuse the webhook authorizer already used by the API server, configured
 to call the API server using the connection information already provided to the kubelet.
 To improve performance of repeated requests with the same authenticated subject and request attributes,
 the same webhook authorizer caching options supported by the API server would be supported:
 * `--authorization-webhook-cache-authorized-ttl`
 * `--authorization-webhook-cache-unauthorized-ttl`
 ### AlwaysAllow
 This mode allows any authenticated request.
 ## Future Work
 * Add support for CRL revocation for x509 client certificate authentication (http://issue.k8s.io/18982)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-auth.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-cri-logging.md
+++ b/docs/proposals/kubelet-cri-logging.md
@ -1,269 +1 @@
-<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-cri-logging.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-cri-logging.md)
 <!-- BEGIN STRIP_FOR_RELEASE -->
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
     width="25" height="25">
 <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 If you are using a released version of Kubernetes, you should
 refer to the docs that go with that version.
 Documentation for other releases can be found at
 [releases.k8s.io](http://releases.k8s.io).
 </strong>
 --
 <!-- END STRIP_FOR_RELEASE -->
 <!-- END MUNGE: UNVERSIONED_WARNING -->
 # CRI: Log management for container stdout/stderr streams
 ## Goals and non-goals
 Container Runtime Interface (CRI) is an ongoing project to allow container
 runtimes to integrate with kubernetes via a newly-defined API. The goal of this
 proposal is to define how container's *stdout/stderr* log streams should be
 handled in CRI.
 The explicit non-goal is to define how (non-stdout/stderr) application logs
 should be handled. Collecting and managing arbitrary application logs is a
 long-standing issue [1] in kubernetes and is worth a proposal of its own. Even
 though this proposal does not touch upon these logs, the direction of
 this proposal is aligned with one of the most-discussed solutions, logging
 volumes [1], for general logging management.
 *In this proposal, “logs” refer to the stdout/stderr streams of the
 containers, unless specified otherwise.*
 Previous CRI logging issues:
 - Tracking issue: https://github.com/kubernetes/kubernetes/issues/30709
 - Proposal (by @tmrtfs): https://github.com/kubernetes/kubernetes/pull/33111
 The scope of this proposal is narrower than the #33111 proposal, and hopefully
 this will encourage a more focused discussion.
 ## Background
 Below is a brief overview of logging in kubernetes with docker, which is the
 only container runtime with fully functional integration today.
 **Log lifecycle and management**
 Docker supports various logging drivers (e.g., syslog, journal, and json-file),
 and allows users to configure the driver by passing flags to the docker daemon
 at startup. Kubernetes defaults to the "json-file" logging driver, in which
 docker writes the stdout/stderr streams to a file in the json format as shown
 below.
 ```
 {“log”: “The actual log line”, “stream”: “stderr”, “time”: “2016-10-05T00:00:30.082640485Z”}
 ```
 Docker deletes the log files when the container is removed, and a cron-job (or
 systemd timer-based job) on the node is responsible to rotate the logs (using
 `logrotate`). To preserve the logs for introspection and debuggability, kubelet
 keeps the terminated container until the pod object has been deleted from the
 apiserver.
 **Container log retrieval**
 The kubernetes CLI tool, kubectl, allows users to access the container logs
 using [`kubectl logs`]
 (http://kubernetes.io/docs/user-guide/kubectl/kubectl_logs/) command.
 `kubectl logs` supports flags such as `--since` that requires understanding of
 the format and the metadata (i.e., timestamps) of the logs. In the current
 implementation, kubelet calls `docker logs` with parameters to return the log
 content. As of now, docker only supports `log` operations for the “journal” and
 “json-file” drivers [2]. In other words, *the support of `kubectl logs` is not
 universal in all kuernetes deployments*.
 **Cluster logging support**
 In a production cluster, logs are usually collected, aggregated, and shipped to
 a remote store where advanced analysis/search/archiving functions are
 supported. In kubernetes, the default cluster-addons includes a per-node log
 collection daemon, `fluentd`. To facilitate the log collection, kubelet creates
 symbolic links to all the docker containers logs under `/var/log/containers`
 with pod and container metadata embedded in the filename.
 ```
 /var/log/containers/<pod_name>_<pod_namespace>_<container_name>-<container_id>.log`
 ```
 The fluentd daemon watches the `/var/log/containers/` directory and extract the
 metadata associated with the log from the path. Note that this integration
 requires kubelet to know where the container runtime stores the logs, and will
 not be directly applicable to CRI.
 ## Requirements
   1. **Provide ways for CRI-compliant runtimes to support all existing logging
        features, i.e., `kubectl logs`.**
   2. **Allow kubelet to manage the lifecycle of the logs to pave the way for
        better disk management in the future.** This implies that the lifecycle
        of containers and their logs need to be decoupled.
   3. **Allow log collectors to easily integrate with Kubernetes across
        different container runtimes while preserving efficient storage and
        retrieval.**
 Requirement (1) provides opportunities for runtimes to continue support
 `kubectl logs --since` and related features. Note that even though such
 features are only supported today for a limited set of log drivers, this is an
 important usability tool for a fresh, basic kubernetes cluster, and should not
 be overlooked. Requirement (2) stems from the fact that disk is managed by
 kubelet as a node-level resource (not per-pod) today, hence it is difficult to
 delegate to the runtime by enforcing per-pod disk quota policy. In addition,
 container disk quota is not well supported yet, and such limitation may not
 even be well-perceived by users. Requirement (1) is crucial to the kubernetes'
 extensibility and usability across all deployments.
 ## Proposed solution
 This proposal intends to satisfy the requirements by
  1. Enforce where the container logs should be stored on the host
     filesystem. Both kubelet and the log collector can interact with
     the log files directly.
  2. Ask the runtime to decorate the logs in a format that kubelet understands.
 **Log directories and structures**
 Kubelet will be configured with a root directory (e.g., `/var/log/pods` or
 `/var/lib/kubelet/logs/) to store all container logs. Below is an example of a
 path to the log of a container in a pod.
 ```
 /var/log/pods/<podUID>/<containerName>_<instance#>.log
 ```
 In CRI, this is implemented by setting the pod-level log directory when
 creating the pod sandbox, and passing the relative container log path
 when creating a container.
 ```
 PodSandboxConfig.LogDirectory: /var/log/pods/<podUID>/
 ContainerConfig.LogPath: <containerName>_<instance#>.log
 ```
 Because kubelet determines where the logs are stores and can access them
 directly, this meets requirement (1). As for requirement (2), the log collector
 can easily extract basic pod metadata (e.g., pod UID, container name) from
 the paths, and watch the directly for any changes. In the future, we can
 extend this by maintaining a metada file in the pod directory.
 **Log format**
 The runtime should decorate each log entry with a RFC 3339Nano timestamp
 prefix, the stream type (i.e., "stdout" or "stderr"), and ends with a newline.
 ```
 2016-10-06T00:17:09.669794202Z stdout The content of the log entry 1
 2016-10-06T00:17:10.113242941Z stderr The content of the log entry 2
 ```
 With the knowledge, kubelet can parses the logs and serve them for `kubectl
 logs` requests. This meets requirement (3). Note that the format is defined
 deliberately simple to provide only information necessary to serve the requests.
 We do not intend for kubelet to host various logging plugins. It is also worth
 mentioning again that the scope of this proposal is restricted to stdout/stderr
 streams of the container, and we impose no restriction to the logging format of
 arbitrary container logs.
 **Who should rotate the logs?**
 We assume that a separate task (e.g., cron job) will be configured on the node
 to rotate the logs periodically, similar to today’s implementation.
 We do not rule out the possibility of letting kubelet or a per-node daemon
 (`DaemonSet`) to take up the responsibility, or even declare rotation policy
 in the kubernetes API as part of the `PodSpec`, but it is beyond the scope of
 the this proposal.
 **What about non-supported log formats?**
 If a runtime chooses to store logs in non-supported formats, it essentially
 opts out of `kubectl logs` features, which is backed by kubelet today. It is
 assumed that the user can rely on the advanced, cluster logging infrastructure
 to examine the logs.
 It is also possible that in the future, `kubectl logs` can contact the cluster
 logging infrastructure directly to serve logs [1a]. Note that this does not
 eliminate the need to store the logs on the node locally for reliability.
 **How can existing runtimes (docker/rkt) comply to the logging requirements?**
 In the short term, the ongoing docker-CRI integration [3] will support the
 proposed solution only partially by (1) creating symbolic links for kubelet
 to access, but not manage the logs, and (2) add support for json format in
 kubelet. A more sophisticated solution that either involves using a custom
 plugin or launching a separate process to copy and decorate the log will be
 considered as a mid-term solution.
 For rkt, implementation will rely on providing external file-descriptors for
 stdout/stderr to applications via systemd [4]. Those streams are currently
 managed by a journald sidecar, which collects stream outputs and store them
 in the journal file of the pod. This will replaced by a custom sidecar which
 can produce logs in the format expected by this specification and can handle
 clients attaching as well.
 ## Alternatives
 There are ad-hoc solutions/discussions that addresses one or two of the
 requirements, but no comprehensive solution for CRI specifically has been
 proposed so far (with the excpetion of @tmrtfs's proposal
 [#33111](https://github.com/kubernetes/kubernetes/pull/33111), which has a much
 wider scope). It has come up in discussions that kubelet can delegate all the
 logging management to the runtime to allow maximum flexibility. However, it is
 difficult for this approach to meet either requirement (1) or (2), without
 defining complex logging API.
 There are also possibilities to implement the current proposal by imposing the
 log file paths, while leveraging the runtime to access and/or manage logs. This
 requires the runtime to expose knobs in CRI to retrieve, remove, and examine
 the disk usage of logs. The upside of this approach is that kubelet needs not
 mandate the logging format, assuming runtime already includes plugins for
 various logging formats. Unfortunately, this is not true for existing runtimes
 such as docker, which supports log retrieval only for a very limited number of
 log drivers [2]. On the other hand, the downside is that we would be enforcing
 more requirements on the runtime through log storage location on the host, and
 a potentially premature logging API that may change as the disk management
 evolves.
 ## References
 [1] Log management issues:
 - a. https://github.com/kubernetes/kubernetes/issues/17183
 - b. https://github.com/kubernetes/kubernetes/issues/24677
 - c. https://github.com/kubernetes/kubernetes/pull/13010
 [2] Docker logging drivers:
 - https://docs.docker.com/engine/admin/logging/overview/
 [3] Docker CRI integration:
 - https://github.com/kubernetes/kubernetes/issues/31459
 [4] rkt support: https://github.com/systemd/systemd/pull/4179
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-cri-logging.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-eviction.md
+++ b/docs/proposals/kubelet-eviction.md
@ -1,462 +1 @@
-# Kubelet - Eviction Policy
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-eviction.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-eviction.md)
 **Authors**: Derek Carr (@derekwaynecarr), Vishnu Kannan (@vishh)
 **Status**: Proposed (memory evictions WIP)
 This document presents a specification for how the `kubelet` evicts pods when compute resources are too low.
 ## Goals
 The node needs a mechanism to preserve stability when available compute resources are low.
 This is especially important when dealing with incompressible compute resources such
 as memory or disk.  If either resource is exhausted, the node would become unstable.
 The `kubelet` has some support for influencing system behavior in response to a system OOM by
 having the system OOM killer see higher OOM score adjust scores for containers that have consumed
 the largest amount of memory relative to their request.  System OOM events are very compute
 intensive, and can stall the node until the OOM killing process has completed.  In addition,
 the system is prone to return to an unstable state since the containers that are killed due to OOM
 are either restarted or a new pod is scheduled on to the node.
 Instead, we would prefer a system where the `kubelet` can pro-actively monitor for
 and prevent against total starvation of a compute resource, and in cases of where it
 could appear to occur, pro-actively fail one or more pods, so the workload can get
 moved and scheduled elsewhere when/if its backing controller creates a new pod.
 ## Scope of proposal
 This proposal defines a pod eviction policy for reclaiming compute resources.
 As of now, memory and disk based evictions are supported.
 The proposal focuses on a simple default eviction strategy
 intended to cover the broadest class of user workloads.
 ## Eviction Signals
 The `kubelet` will support the ability to trigger eviction decisions on the following signals.
 | Eviction Signal  | Description                                                                     |
 |------------------|---------------------------------------------------------------------------------|
 | memory.available | memory.available := node.status.capacity[memory] - node.stats.memory.workingSet |
 | nodefs.available   | nodefs.available := node.stats.fs.available |
 | nodefs.inodesFree | nodefs.inodesFree := node.stats.fs.inodesFree |
 | imagefs.available | imagefs.available := node.stats.runtime.imagefs.available |
 | imagefs.inodesFree | imagefs.inodesFree := node.stats.runtime.imagefs.inodesFree |
 Each of the above signals support either a literal or percentage based value.  The percentage based value
 is calculated relative to the total capacity associated with each signal.
 `kubelet` supports only two filesystem partitions.
 1. The `nodefs` filesystem that kubelet uses for volumes, daemon logs, etc.
 1. The `imagefs` filesystem that container runtimes uses for storing images and container writable layers.
 `imagefs` is optional. `kubelet` auto-discovers these filesystems using cAdvisor.
 `kubelet` does not care about any other filesystems. Any other types of configurations are not currently supported by the kubelet. For example, it is *not OK* to store volumes and logs in a dedicated `imagefs`.
 ## Eviction Thresholds
 The `kubelet` will support the ability to specify eviction thresholds.
 An eviction threshold is of the following form:
 `<eviction-signal><operator><quantity | int%>`
 * valid `eviction-signal` tokens as defined above.
 * valid `operator` tokens are `<`
 * valid `quantity` tokens must match the quantity representation used by Kubernetes
 * an eviction threshold can be expressed as a percentage if ends with `%` token.
 If threshold criteria are met, the `kubelet` will take pro-active action to attempt
 to reclaim the starved compute resource associated with the eviction signal.
 The `kubelet` will support soft and hard eviction thresholds.
 For example, if a node has `10Gi` of memory, and the desire is to induce eviction
 if available memory falls below `1Gi`, an eviction signal can be specified as either
 of the following (but not both).
 * `memory.available<10%`
 * `memory.available<1Gi`
 ### Soft Eviction Thresholds
 A soft eviction threshold pairs an eviction threshold with a required
 administrator specified grace period.  No action is taken by the `kubelet`
 to reclaim resources associated with the eviction signal until that grace
 period has been exceeded.  If no grace period is provided, the `kubelet` will
 error on startup.
 In addition, if a soft eviction threshold has been met, an operator can
 specify a maximum allowed pod termination grace period to use when evicting
 pods from the node.  If specified, the `kubelet` will use the lesser value among
 the `pod.Spec.TerminationGracePeriodSeconds` and the max allowed grace period.
 If not specified, the `kubelet` will kill pods immediately with no graceful
 termination.
 To configure soft eviction thresholds, the following flags will be supported:
 ```
 --eviction-soft="": A set of eviction thresholds (e.g. memory.available<1.5Gi) that if met over a corresponding grace period would trigger a pod eviction.
 --eviction-soft-grace-period="": A set of eviction grace periods (e.g. memory.available=1m30s) that correspond to how long a soft eviction threshold must hold before triggering a pod eviction.
 --eviction-max-pod-grace-period="0": Maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met.
 ```
 ### Hard Eviction Thresholds
 A hard eviction threshold has no grace period, and if observed, the `kubelet`
 will take immediate action to reclaim the associated starved resource.  If a
 hard eviction threshold is met, the `kubelet` will kill the pod immediately
 with no graceful termination.
 To configure hard eviction thresholds, the following flag will be supported:
 ```
 --eviction-hard="": A set of eviction thresholds (e.g. memory.available<1Gi) that if met would trigger a pod eviction.
 ```
 ## Eviction Monitoring Interval
 The `kubelet` will initially evaluate eviction thresholds at the same
 housekeeping interval as `cAdvisor` housekeeping.
 In Kubernetes 1.2, this was defaulted to `10s`.
 It is a goal to shrink the monitoring interval to a much shorter window.
 This may require changes to `cAdvisor` to let alternate housekeeping intervals
 be specified for selected data (https://github.com/google/cadvisor/issues/1247)
 For the purposes of this proposal, we expect the monitoring interval to be no
 more than `10s` to know when a threshold has been triggered, but we will strive
 to reduce that latency time permitting.
 ## Node Conditions
 The `kubelet` will support a node condition that corresponds to each eviction signal.
 If a hard eviction threshold has been met, or a soft eviction threshold has been met
 independent of its associated grace period, the `kubelet` will report a condition that
 reflects the node is under pressure.
 The following node conditions are defined that correspond to the specified eviction signal.
 | Node Condition | Eviction Signal  | Description                                                      |
 |----------------|------------------|------------------------------------------------------------------|
 | MemoryPressure | memory.available | Available memory on the node has satisfied an eviction threshold |
 | DiskPressure | nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree | Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold |
 The `kubelet` will continue to report node status updates at the frequency specified by
 `--node-status-update-frequency` which defaults to `10s`.
 ### Oscillation of node conditions
 If a node is oscillating above and below a soft eviction threshold, but not exceeding
 its associated grace period, it would cause the corresponding node condition to
 constantly oscillate between true and false, and could cause poor scheduling decisions
 as a consequence.
 To protect against this oscillation, the following flag is defined to control how
 long the `kubelet` must wait before transitioning out of a pressure condition.
 ```
 --eviction-pressure-transition-period=5m0s: Duration for which the kubelet has to wait
 before transitioning out of an eviction pressure condition.
 ```
 The `kubelet` would ensure that it has not observed an eviction threshold being met
 for the specified pressure condition for the period specified before toggling the
 condition back to `false`.
 ## Eviction scenarios
 ### Memory
 Let's assume the operator started the `kubelet` with the following:
 ```
 --eviction-hard="memory.available<100Mi"
 --eviction-soft="memory.available<300Mi"
 --eviction-soft-grace-period="memory.available=30s"
 ```
 The `kubelet` will run a sync loop that looks at the available memory
 on the node as reported from `cAdvisor` by calculating (capacity - workingSet).
 If available memory is observed to drop below 100Mi, the `kubelet` will immediately
 initiate eviction. If available memory is observed as falling below `300Mi`,
 it will record when that signal was observed internally in a cache.  If at the next
 sync, that criteria was no longer satisfied, the cache is cleared for that
 signal.  If that signal is observed as being satisfied for longer than the
 specified period, the `kubelet` will initiate eviction to attempt to
 reclaim the resource that has met its eviction threshold.
 ### Disk
 Let's assume the operator started the `kubelet` with the following:
 ```
 --eviction-hard="nodefs.available<1Gi,nodefs.inodesFree<1,imagefs.available<10Gi,imagefs.inodesFree<10"
 --eviction-soft="nodefs.available<1.5Gi,nodefs.inodesFree<10,imagefs.available<20Gi,imagefs.inodesFree<100"
 --eviction-soft-grace-period="nodefs.available=1m,imagefs.available=2m"
 ```
 The `kubelet` will run a sync loop that looks at the available disk
 on the node's supported partitions as reported from `cAdvisor`.
 If available disk space on the node's primary filesystem is observed to drop below 1Gi
 or the free inodes on the node's primary filesystem is less than 1,
 the `kubelet` will immediately initiate eviction.
 If available disk space on the node's image filesystem is observed to drop below 10Gi
 or the free inodes on the node's primary image filesystem is less than 10,
 the `kubelet` will immediately initiate eviction.
 If available disk space on the node's primary filesystem is observed as falling below `1.5Gi`,
 or if the free inodes on the node's primary filesystem is less than 10,
 or if available disk space on the node's image filesystem is observed as falling below `20Gi`,
 or if the free inodes on the node's image filesystem is less than 100,
 it will record when that signal was observed internally in a cache.  If at the next
 sync, that criterion was no longer satisfied, the cache is cleared for that
 signal.  If that signal is observed as being satisfied for longer than the
 specified period, the `kubelet` will initiate eviction to attempt to
 reclaim the resource that has met its eviction threshold.
 ## Eviction of Pods
 If an eviction threshold has been met, the `kubelet` will initiate the
 process of evicting pods until it has observed the signal has gone below
 its defined threshold.
 The eviction sequence works as follows:
 * for each monitoring interval, if eviction thresholds have been met
 * find candidate pod
 * fail the pod
 * block until pod is terminated on node
 If a pod is not terminated because a container does not happen to die
 (i.e. processes stuck in disk IO for example), the `kubelet` may select
 an additional pod to fail instead.  The `kubelet` will invoke the `KillPod`
 operation exposed on the runtime interface.  If an error is returned,
 the `kubelet` will select a subsequent pod.
 ## Eviction Strategy
 The `kubelet` will implement a default eviction strategy oriented around
 the pod quality of service class.
 It will target pods that are the largest consumers of the starved compute
 resource relative to their scheduling request.  It ranks pods within a
 quality of service tier in the following order.
 * `BestEffort` pods that consume the most of the starved resource are failed
 first.
 * `Burstable` pods that consume the greatest amount of the starved resource
 relative to their request for that resource are killed first.  If no pod
 has exceeded its request, the strategy targets the largest consumer of the
 starved resource.
 * `Guaranteed` pods that consume the greatest amount of the starved resource
 relative to their request are killed first.  If no pod has exceeded its request,
 the strategy targets the largest consumer of the starved resource.
 A guaranteed pod is guaranteed to never be evicted because of another pod's
 resource consumption.  That said, guarantees are only as good as the underlying
 foundation they are built upon.  If a system daemon
 (i.e. `kubelet`, `docker`, `journald`, etc.) is consuming more resources than
 were reserved via `system-reserved` or `kube-reserved` allocations, and the node
 only has guaranteed pod(s) remaining, then the node must choose to evict a
 guaranteed pod in order to preserve node stability, and to limit the impact
 of the unexpected consumption to other guaranteed pod(s).
 ## Disk based evictions
 ### With Imagefs
 If `nodefs` filesystem has met eviction thresholds, `kubelet` will free up disk space in the following order:
 1. Delete logs
 1. Evict Pods if required.
 If `imagefs` filesystem has met eviction thresholds, `kubelet` will free up disk space in the following order:
 1. Delete unused images
 1. Evict Pods if required.
 ### Without Imagefs
 If `nodefs` filesystem has met eviction thresholds, `kubelet` will free up disk space in the following order:
 1. Delete logs
 1. Delete unused images
 1. Evict Pods if required.
 Let's explore the different options for freeing up disk space.
 ### Delete logs of dead pods/containers
 As of today, logs are tied to a container's lifetime. `kubelet` keeps dead containers around,
 to provide access to logs.
 In the future, if we store logs of dead containers outside of the container itself, then
 `kubelet` can delete these logs to free up disk space.
 Once the lifetime of containers and logs are split, kubelet can support more user friendly policies
 around log evictions. `kubelet` can delete logs of the oldest containers first.
 Since logs from the first and the most recent incarnation of a container is the most important for most applications,
 kubelet can try to preserve these logs and aggressively delete logs from other container incarnations.
 Until logs are split from container's lifetime, `kubelet` can delete dead containers to free up disk space.
 ### Delete unused images
 `kubelet` performs image garbage collection based on thresholds today. It uses a high and a low watermark.
 Whenever disk usage exceeds the high watermark, it removes images until the low watermark is reached.
 `kubelet` employs a LRU policy when it comes to deleting images.
 The existing policy will be replaced with a much simpler policy.
 Images will be deleted based on eviction thresholds. If kubelet can delete logs and keep disk space availability
 above eviction thresholds, then kubelet will not delete any images.
 If `kubelet` decides to delete unused images, it will delete *all* unused images.
 ### Evict pods
 There is no ability to specify disk limits for pods/containers today.
 Disk is a best effort resource. When necessary, `kubelet` can evict pods one at a time.
 `kubelet` will follow the [Eviction Strategy](#eviction-strategy) mentioned above for making eviction decisions.
 `kubelet` will evict the pod that will free up the maximum amount of disk space on the filesystem that has hit eviction thresholds.
 Within each QoS bucket, `kubelet` will sort pods according to their disk usage.
 `kubelet` will sort pods in each bucket as follows:
 #### Without Imagefs
 If `nodefs` is triggering evictions, `kubelet` will sort pods based on their total disk usage
 - local volumes + logs & writable layer of all its containers.
 #### With Imagefs
 If `nodefs` is triggering evictions, `kubelet` will sort pods based on the usage on `nodefs`
 - local volumes + logs of all its containers.
 If `imagefs` is triggering evictions, `kubelet` will sort pods based on the writable layer usage of all its containers.
 ## Minimum eviction reclaim
 In certain scenarios, eviction of pods could result in reclamation of small amount of resources. This can result in
 `kubelet` hitting eviction thresholds in repeated successions. In addition to that, eviction of resources like `disk`,
 is time consuming.
 To mitigate these issues, `kubelet` will have a per-resource `minimum-reclaim`. Whenever `kubelet` observes
 resource pressure, `kubelet` will attempt to reclaim at least `minimum-reclaim` amount of resource.
 Following are the flags through which `minimum-reclaim` can be configured for each evictable resource:
 `--eviction-minimum-reclaim="memory.available=0Mi,nodefs.available=500Mi,imagefs.available=2Gi"`
 The default `eviction-minimum-reclaim` is `0` for all resources.
 ## Deprecation of existing features
 `kubelet` has been freeing up disk space on demand to keep the node stable. As part of this proposal,
 some of the existing features/flags around disk space retrieval will be deprecated in-favor of this proposal.
 | Existing Flag | New Flag | Rationale |
 | ------------- | -------- | --------- |
 | `--image-gc-high-threshold` | `--eviction-hard` or `eviction-soft` | existing eviction signals can capture image garbage collection |
 | `--image-gc-low-threshold` | `--eviction-minimum-reclaim` | eviction reclaims achieve the same behavior |
 | `--maximum-dead-containers` | | deprecated once old logs are stored outside of container's context |
 | `--maximum-dead-containers-per-container` | | deprecated once old logs are stored outside of container's context |
 | `--minimum-container-ttl-duration` | | deprecated once old logs are stored outside of container's context |
 | `--low-diskspace-threshold-mb` | `--eviction-hard` or `eviction-soft` | this use case is better handled by this proposal |
 | `--outofdisk-transition-frequency` | `--eviction-pressure-transition-period` | make the flag generic to suit all compute resources |
 ## Kubelet Admission Control
 ### Feasibility checks during kubelet admission
 #### Memory
 The `kubelet` will reject `BestEffort` pods if any of the memory
 eviction thresholds have been exceeded independent of the configured
 grace period.
 Let's assume the operator started the `kubelet` with the following:
 ```
 --eviction-soft="memory.available<256Mi"
 --eviction-soft-grace-period="memory.available=30s"
 ```
 If the `kubelet` sees that it has less than `256Mi` of memory available
 on the node, but the `kubelet` has not yet initiated eviction since the
 grace period criteria has not yet been met, the `kubelet` will still immediately
 fail any incoming best effort pods.
 The reasoning for this decision is the expectation that the incoming pod is
 likely to further starve the particular compute resource and the `kubelet` should
 return to a steady state before accepting new workloads.
 #### Disk
 The `kubelet` will reject all pods if any of the disk eviction thresholds have been met.
 Let's assume the operator started the `kubelet` with the following:
 ```
 --eviction-soft="nodefs.available<1500Mi"
 --eviction-soft-grace-period="nodefs.available=30s"
 ```
 If the `kubelet` sees that it has less than `1500Mi` of disk available
 on the node, but the `kubelet` has not yet initiated eviction since the
 grace period criteria has not yet been met, the `kubelet` will still immediately
 fail any incoming pods.
 The rationale for failing **all** pods instead of just best effort is because disk is currently
 a best effort resource for all QoS classes.
 Kubelet will apply the same policy even if there is a dedicated `image` filesystem.
 ## Scheduler
 The node will report a condition when a compute resource is under pressure.  The
 scheduler should view that condition as a signal to dissuade placing additional
 best effort pods on the node.
 In this case, the `MemoryPressure` condition if true should dissuade the scheduler
 from placing new best effort pods on the node since they will be rejected by the `kubelet` in admission.
 On the other hand, the `DiskPressure` condition if true should dissuade the scheduler from
 placing **any** new pods on the node since they will be rejected by the `kubelet` in admission.
 ## Best Practices
 ### DaemonSet
 It is never desired for a `kubelet` to evict a pod that was derived from
 a `DaemonSet` since the pod will immediately be recreated and rescheduled
 back to the same node.
 At the moment, the `kubelet` has no ability to distinguish a pod created
 from `DaemonSet` versus any other object.  If/when that information is
 available, the `kubelet` could pro-actively filter those pods from the
 candidate set of pods provided to the eviction strategy.
 In general, it should be strongly recommended that `DaemonSet` not
 create `BestEffort` pods to avoid being identified as a candidate pod
 for eviction. Instead `DaemonSet` should ideally include Guaranteed pods only.
 ## Known issues
 ### kubelet may evict more pods than needed
 The pod eviction may evict more pods than needed due to stats collection timing gap. This can be mitigated by adding
 the ability to get root container stats on an on-demand basis (https://github.com/google/cadvisor/issues/1247) in the future.
 ### How kubelet ranks pods for eviction in response to inode exhaustion
 At this time, it is not possible to know how many inodes were consumed by a particular container.  If the `kubelet` observes
 inode exhaustion, it will evict pods by ranking them by quality of service.  The following issue has been opened in cadvisor
 to track per container inode consumption (https://github.com/google/cadvisor/issues/1422) which would allow us to rank pods
 by inode consumption.  For example, this would let us identify a container that created large numbers of 0 byte files, and evict
 that pod over others.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-eviction.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-hypercontainer-runtime.md
+++ b/docs/proposals/kubelet-hypercontainer-runtime.md
@ -1,45 +1 @@
-Kubelet HyperContainer Container Runtime
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-hypercontainer-runtime.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-hypercontainer-runtime.md)
 =======================================
 Authors: Pengfei Ni (@feiskyer), Harry Zhang (@resouer)
 ## Abstract
 This proposal aims to support [HyperContainer](http://hypercontainer.io) container
 runtime in Kubelet.
 ## Motivation
 HyperContainer is a Hypervisor-agnostic Container Engine that allows you to run Docker images using
 hypervisors (KVM, Xen, etc.). By running containers within separate VM instances, it offers a
 hardware-enforced isolation, which is required in multi-tenant environments.
 ## Goals
 1. Complete pod/container/image lifecycle management with HyperContainer.
 2. Setup network by network plugins.
 3. 100% Pass node e2e tests.
 4. Easy to deploy for both local dev/test and production clusters.
 ## Design
 The HyperContainer runtime will make use of the kubelet Container Runtime Interface. [Fakti](https://github.com/kubernetes/frakti) implements the CRI interface and exposes
 a local endpoint to Kubelet. Fakti communicates with [hyperd](https://github.com/hyperhq/hyperd)
 with its gRPC API to manage the lifecycle of sandboxes, containers and images.
 ![frakti](https://cloud.githubusercontent.com/assets/676637/18940978/6e3e5384-863f-11e6-9132-b638d862fd09.png)
 ## Limitations
 Since pods are running directly inside hypervisor, host network is not supported in HyperContainer
 runtime.
 ## Development
 The HyperContainer runtime is maintained by <https://github.com/kubernetes/frakti>.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-hypercontainer-runtime.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-rkt-runtime.md
+++ b/docs/proposals/kubelet-rkt-runtime.md
@ -1,103 +1 @@
-Next generation rkt runtime integration
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-rkt-runtime.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-rkt-runtime.md)
 =======================================
 Authors: Euan Kemp (@euank), Yifan Gu (@yifan-gu)
 ## Abstract
 This proposal describes the design and road path for integrating rkt with kubelet with the new container runtime interface.
 ## Background
 Currently, the Kubernetes project supports rkt as a container runtime via an implementation under [pkg/kubelet/rkt package](https://github.com/kubernetes/kubernetes/tree/v1.5.0-alpha.0/pkg/kubelet/rkt).
 This implementation, for historical reasons, has required implementing a large amount of logic shared by the original Docker implementation.
 In order to make additional container runtime integrations easier, more clearly defined, and more consistent, a new [Container Runtime Interface](https://github.com/kubernetes/kubernetes/blob/v1.5.0-alpha.0/pkg/kubelet/api/v1alpha1/runtime/api.proto) (CRI) is being designed.
 The existing runtimes, in order to both prove the correctness of the interface and reduce maintenance burden, are incentivized to move to this interface.
 This document proposes how the rkt runtime integration will transition to using the CRI.
 ## Goals
 ### Full-featured
 The CRI integration must work as well as the existing integration in terms of features.
 Until that's the case, the existing integration will continue to be maintained.
 ### Easy to Deploy
 The new integration should not be any more difficult to deploy and configure than the existing integration.
 ### Easy to Develop
 This iteration should be as easy to work and iterate on as the original one.
 It will be available in an initial usable form quickly in order to validate the CRI.
 ## Design
 In order to fulfill the above goals, the rkt CRI integration will make the following choices:
 ### Remain in-process with Kubelet
 The current rkt container runtime integration is able to be deployed simply by deploying the kubelet binary.
 This is, in no small part, to make it *Easy to Deploy*.
 Remaining in-process also helps this integration not regress on performance, one axis of being *Full-Featured*.
 ### Communicate through gRPC
 Although the kubelet and rktlet will be compiled together, the runtime and kubelet will still communicate through gRPC interface for better API abstraction.
 For the near short term, they will still talk through a unix socket before we implement a custom gRPC connection that skips the network stack.
 ### Developed as a Separate Repository
 Brian Grant's discussion on splitting the Kubernetes project into [separate repos](https://github.com/kubernetes/kubernetes/issues/24343) is a compelling argument for why it makes sense to split this work into a separate repo.
 In order to be *Easy to Develop*, this iteration will be maintained as a separate repository, and re-vendored back in.
 This choice will also allow better long-term growth in terms of better issue-management, testing pipelines, and so on.
 Unfortunately, in the short term, it's possible that some aspects of this will also cause pain and it's very difficult to weight each side correctly.
 ### Exec the rkt binary (initially)
 While significant work on the rkt [api-service](https://coreos.com/rkt/docs/latest/subcommands/api-service.html) has been made,
 it has also been a source of problems and additional complexity,
 and was never transitioned to entirely.
 In addition, the rkt cli has historically been the primary interface to the rkt runtime.
 The initial integration will execute the rkt binary directly for app creation/start/stop/removal, as well as image pulling/removal.
 The creation of pod sanbox is also done via rkt command line, but it will run under `systemd-run` so it's monitored by the init process.
 In the future, some of these decisions are expected to be changed such that rkt is vendored as a library dependency for all operations, and other init systems will be supported as well.
 ## Roadmap and Milestones
 1. rktlet integrate with kubelet to support basic pod/container lifecycle (pod creation, container creation/start/stop, pod stop/removal) [[Done]](https://github.com/kubernetes-incubator/rktlet/issues/9)
 2. rktlet integrate with kubelet to support more advanced features:
   - Support kubelet networking, host network
   - Support mount / volumes [[#33526]](https://github.com/kubernetes/kubernetes/issues/33526)
   - Support exposing ports
   - Support privileged containers
   - Support selinux options [[#33139]](https://github.com/kubernetes/kubernetes/issues/33139)
   - Support attach [[#29579]](https://github.com/kubernetes/kubernetes/issues/29579)
   - Support exec [[#29579]](https://github.com/kubernetes/kubernetes/issues/29579)
   - Support logging [[#33111]](https://github.com/kubernetes/kubernetes/pull/33111)
 3. rktlet integrate with kubelet, pass 100% e2e and node e2e tests, with nspawn stage1.
 4. rktlet integrate with kubelet, pass 100% e2e and node e2e tests, with kvm stage1.
 5. Revendor rktlet into `pkg/kubelet/rktshim`, and start deprecating the `pkg/kubelet/rkt` package.
 6. Eventually replace the current `pkg/kubelet/rkt` package.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-rkt-runtime.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-systemd.md
+++ b/docs/proposals/kubelet-systemd.md
@ -1,407 +1 @@
-# Kubelet and systemd interaction
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-systemd.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-systemd.md)
 **Author**: Derek Carr (@derekwaynecarr)
 **Status**: Proposed
 ## Motivation
 Many Linux distributions have either adopted, or plan to adopt `systemd` as their init system.
 This document describes how the node should be configured, and a set of enhancements that should
 be made to the `kubelet` to better integrate with these distributions independent of container
 runtime.
 ## Scope of proposal
 This proposal does not account for running the `kubelet` in a container.
 ## Background on systemd
 To help understand this proposal, we first provide a brief summary of `systemd` behavior.
 ### systemd units
 `systemd` manages a hierarchy of `slice`, `scope`, and `service` units.
 * `service` - application on the server that is launched by `systemd`; how it should start/stop;
 when it should be started; under what circumstances it should be restarted; and any resource
 controls that should be applied to it.
 * `scope` - a process or group of processes which are not launched by `systemd` (i.e. fork), like
 a service, resource controls may be applied
 * `slice` - organizes a hierarchy in which `scope` and `service` units are placed.  a `slice` may
 contain `slice`, `scope`, or `service` units; processes are attached to `service` and `scope`
 units only, not to `slices`. The hierarchy is intended to be unified, meaning a process may
 only belong to a single leaf node.
 ### cgroup hierarchy: split versus unified hierarchies
 Classical `cgroup` hierarchies were split per resource group controller, and a process could
 exist in different parts of the hierarchy.
 For example, a process `p1` could exist in each of the following at the same time:
 * `/sys/fs/cgroup/cpu/important/`
 * `/sys/fs/cgroup/memory/unimportant/`
 * `/sys/fs/cgroup/cpuacct/unimportant/`
 In addition, controllers for one resource group could depend on another in ways that were not
 always obvious.
 For example, the `cpu` controller depends on the `cpuacct` controller yet they were treated
 separately.
 Many found it confusing for a single process to belong to different nodes in the `cgroup` hierarchy
 across controllers.
 The Kernel direction for `cgroup` support is to move toward a unified `cgroup` hierarchy, where the
 per-controller hierarchies are eliminated in favor of hierarchies like the following:
 * `/sys/fs/cgroup/important/`
 * `/sys/fs/cgroup/unimportant/`
 In a unified hierarchy, a process may only belong to a single node in the `cgroup` tree.
 ### cgroupfs single writer
 The Kernel direction for `cgroup` management is to promote a single-writer model rather than
 allowing multiple processes to independently write to parts of the file-system.
 In distributions that run `systemd` as their init system, the cgroup tree is managed by `systemd`
 by default since it implicitly interacts with the cgroup tree when starting units.  Manual changes
 made by other cgroup managers to the cgroup tree are not guaranteed to be preserved unless `systemd`
 is made aware.  `systemd` can be told to ignore sections of the cgroup tree by configuring the unit
 to have the `Delegate=` option.
 See: http://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#Delegate=
 ### cgroup management with systemd and container runtimes
 A `slice` corresponds to an inner-node in the `cgroup` file-system hierarchy.
 For example, the `system.slice` is represented as follows:
 `/sys/fs/cgroup/<controller>/system.slice`
 A `slice` is nested in the hierarchy by its naming convention.
 For example, the `system-foo.slice` is represented as follows:
 `/sys/fs/cgroup/<controller>/system.slice/system-foo.slice/`
 A `service` or `scope` corresponds to leaf nodes in the `cgroup` file-system hierarchy managed by
 `systemd`. Services and scopes can have child nodes managed outside of `systemd` if they have been
 delegated with the `Delegate=` option.
 For example, if the `docker.service` is associated with the `system.slice`, it is
 represented as follows:
 `/sys/fs/cgroup/<controller>/system.slice/docker.service/`
 To demonstrate the use of `scope` units using the `docker` container runtime, if a
 user launches a container via `docker run -m 100M busybox`, a `scope` will be created
 because the process was not launched by `systemd` itself.  The `scope` is parented by
 the `slice` associated with the launching daemon.
 For example:
 `/sys/fs/cgroup/<controller>/system.slice/docker-<container-id>.scope`
 `systemd` defines a set of slices.  By default, service and scope units are placed in
 `system.slice`, virtual machines and containers registered with `systemd-machined` are
 found in `machine.slice`, and user sessions handled by `systemd-logind` in `user.slice`.
 ## Node Configuration on systemd
 ### kubelet cgroup driver
 The `kubelet` reads and writes to the `cgroup` tree during bootstrapping
 of the node.  In the future, it will write to the `cgroup` tree to satisfy other
 purposes around quality of service, etc.
 The `kubelet` must cooperate with `systemd` in order to ensure proper function of the
 system.  The bootstrapping requirements for a `systemd` system are different than one
 without it.
 The `kubelet` will accept a new flag to control how it interacts with the `cgroup` tree.
 * `--cgroup-driver=` - cgroup driver used by the kubelet. `cgroupfs` or `systemd`.
 By default, the `kubelet` should default `--cgroup-driver` to `systemd` on `systemd` distributions.
 The `kubelet` should associate node bootstrapping semantics to the configured
 `cgroup driver`.
 ### Node allocatable
 The proposal makes no changes to the definition as presented here:
 https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/node-allocatable.md
 The node will report a set of allocatable compute resources defined as follows:
 `[Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved]`
 ### Node capacity
 The `kubelet` will continue to interface with `cAdvisor` to determine node capacity.
 ### System reserved
 The node may set aside a set of designated resources for non-Kubernetes components.
 The `kubelet` accepts the followings flags that support this feature:
 * `--system-reserved=` - A set of `ResourceName`=`ResourceQuantity` pairs that
 describe resources reserved for host daemons.
 * `--system-container=` - Optional resource-only container in which to place all
 non-kernel processes that are not already in a container. Empty for no container.
 Rolling back the flag requires a reboot. (Default: "").
 The current meaning of `system-container` is inadequate on `systemd` environments.
 The `kubelet` should use the flag to know the location that has the processes that
 are associated with `system-reserved`, but it should not modify the cgroups of
 existing processes on the system during bootstrapping of the node.  This is
 because `systemd` is the `cgroup manager` on the host and it has not delegated
 authority to the `kubelet` to change how it manages `units`.
 The following describes the type of things that can happen if this does not change:
 https://bugzilla.redhat.com/show_bug.cgi?id=1202859
 As a result, the `kubelet` needs to distinguish placement of non-kernel processes
 based on the cgroup driver, and only do its current behavior when not on `systemd`.
 The flag should be modified as follows:
 * `--system-container=` - Name of resource-only container that holds all
 non-kernel processes whose resource consumption is accounted under
 system-reserved.  The default value is cgroup driver specific.  systemd
 defaults to system, cgroupfs defines no default.  Rolling back the flag
 requires a reboot.
 The `kubelet` will error if the defined `--system-container` does not exist
 on `systemd` environments.  It will verify that the appropriate `cpu` and `memory`
 controllers are enabled.
 ### Kubernetes reserved
 The node may set aside a set of resources for Kubernetes components:
 * `--kube-reserved=:` - A set of `ResourceName`=`ResourceQuantity` pairs that
 describe resources reserved for host daemons.
 The `kubelet` does not enforce `--kube-reserved` at this time, but the ability
 to distinguish the static reservation from observed usage is important for node accounting.
 This proposal asserts that `kubernetes.slice` is the default slice associated with
 the `kubelet` and `kube-proxy` service units defined in the project.  Keeping it
 separate from `system.slice` allows for accounting to be distinguished separately.
 The `kubelet` will detect its `cgroup` to track `kube-reserved` observed usage on `systemd`.
 If the `kubelet` detects that its a child of the `system-container` based on the observed
 `cgroup` hierarchy, it will warn.
 If the `kubelet` is launched directly from a terminal, it's most likely destination will
 be in a `scope` that is a child of `user.slice` as follows:
 `/sys/fs/cgroup/<controller>/user.slice/user-1000.slice/session-1.scope`
 In this context, the parent `scope` is what will be used to facilitate local developer
 debugging scenarios for tracking `kube-reserved` usage.
 The `kubelet` has the following flag:
 * `--resource-container="/kubelet":` Absolute name of the resource-only container to create
 and run the Kubelet in (Default: /kubelet).
 This flag will not be supported on `systemd` environments since the init system has already
 spawned the process and placed it in the corresponding container associated with its unit.
 ### Kubernetes container runtime reserved
 This proposal asserts that the reservation of compute resources for any associated
 container runtime daemons is tracked by the operator under the `system-reserved` or
 `kubernetes-reserved` values and any enforced limits are set by the
 operator specific to the container runtime.
 **Docker**
 If the `kubelet` is configured with the `container-runtime` set to `docker`, the
 `kubelet` will detect the `cgroup` associated with the `docker` daemon and use that
 to do local node accounting.  If an operator wants to impose runtime limits on the
 `docker` daemon to control resource usage, the operator should set those explicitly in
 the `service` unit that launches `docker`.  The `kubelet` will not set any limits itself
 at this time and will assume whatever budget was set aside for `docker` was included in
 either `--kube-reserved` or `--system-reserved` reservations.
 Many OS distributions package `docker` by default, and it will often belong to the
 `system.slice` hierarchy, and therefore operators will need to budget it for there
 by default unless they explicitly move it.
 **rkt**
 rkt has no client/server daemon, and therefore has no explicit requirements on container-runtime
 reservation.
 ### kubelet cgroup enforcement
 The `kubelet` does not enforce the `system-reserved` or `kube-reserved` values by default.
 The `kubelet` should support an additional flag to turn on enforcement:
 * `--system-reserved-enforce=false` - Optional flag that if true tells the `kubelet`
 to enforce the `system-reserved` constraints defined (if any)
 * `--kube-reserved-enforce=false` - Optional flag that if true tells the `kubelet`
 to enforce the `kube-reserved` constraints defined (if any)
 Usage of this flag requires that end-user containers are launched in a separate part
 of cgroup hierarchy via `cgroup-root`.
 If this flag is enabled, the `kubelet` will continually validate that the configured
 resource constraints are applied on the associated `cgroup`.
 ### kubelet cgroup-root behavior under systemd
 The `kubelet` supports a `cgroup-root` flag which is the optional root `cgroup` to use for pods.
 This flag should be treated as a pass-through to the underlying configured container runtime.
 If `--cgroup-enforce=true`, this flag warrants special consideration by the operator depending
 on how the node was configured.  For example, if the container runtime is `docker` and its using
 the `systemd` cgroup driver, then `docker` will take the daemon wide default and launch containers
 in the same slice associated with the `docker.service`.  By default, this would mean `system.slice`
 which could cause end-user pods to be launched in the same part of the cgroup hierarchy as system daemons.
 In those environments, it is recommended that `cgroup-root` is configured to be a subtree of `machine.slice`.
 ### Proposed cgroup hierarchy
 ```
 $ROOT
  |
  +- system.slice 
  |   |
  |   +- sshd.service
  |   +- docker.service (optional)
  |   +- ...
  |
  +- kubernetes.slice
  |   |
  |   +- kubelet.service
  |   +- docker.service (optional)
  |
  +- machine.slice (container runtime specific)
  |   |
  |   +- docker-<container-id>.scope
  |
  +- user.slice
  |   +- ...
 ```
 * `system.slice` corresponds to `--system-reserved`, and contains any services the
 operator brought to the node as normal configuration.
 * `kubernetes.slice` corresponds to the `--kube-reserved`, and contains kube specific
 daemons.
 * `machine.slice` should parent all end-user containers on the system and serve as the
 root of the end-user cluster workloads run on the system.
 * `user.slice` is not explicitly tracked by the `kubelet`, but it is possible that `ssh`
 sessions to the node where the user launches actions directly.  Any resource accounting
 reserved for those actions should be part of `system-reserved`.
 The container runtime daemon, `docker` in this outline, must be accounted for in either
 `system.slice` or `kubernetes.slice`.
 In the future, the depth of the container hierarchy is not recommended to be rooted
 more than 2 layers below the root as it historically has caused issues with node performance
 in other `cgroup` aware systems (https://bugzilla.redhat.com/show_bug.cgi?id=850718).  It
 is anticipated that the `kubelet` will parent containers based on quality of service
 in the future.  In that environment, those changes will be relative to the configured
 `cgroup-root`.
 ### Linux Kernel Parameters
 The `kubelet` will set the following:
 * `sysctl -w vm.overcommit_memory=1`
 * `sysctl -w vm.panic_on_oom=0`
 * `sysctl -w kernel/panic=10`
 * `sysctl -w kernel/panic_on_oops=1`
 ### OOM Score Adjustment
 The `kubelet` at bootstrapping will set the `oom_score_adj` value for Kubernetes
 daemons, and any dependent container-runtime daemons.
 If `container-runtime` is set to `docker`, then set its `oom_score_adj=-999`
 ## Implementation concerns
 ### kubelet block-level architecture
 ```
 +----------+       +----------+    +----------+
 |          |       |          |    | Pod      |
 |  Node    <-------+ Container<----+ Lifecycle|
 |  Manager |       | Manager  |    | Manager  |
 |          +------->          |    |          |
 +---+------+       +-----+----+    +----------+
    |                    |
    |                    |
    |  +-----------------+
    |  |                 |
    |  |                 |
 +---v--v--+        +-----v----+
 | cgroups |        | container|
 | library |        | runtimes |
 +---+-----+        +-----+----+
    |                    |
    |                    |
    +---------+----------+
              |
              |
  +-----------v-----------+
  |     Linux Kernel      |
  +-----------------------+
 ```
 The `kubelet` should move to an architecture that resembles the above diagram:
 * The `kubelet` should not interface directly with the `cgroup` file-system, but instead
 should use a common `cgroups library` that has the proper abstraction in place to
 work with either `cgroupfs` or `systemd`.  The `kubelet` should just use `libcontainer`
 abstractions to facilitate this requirement.  The `libcontainer` abstractions as
 currently defined only support an `Apply(pid)` pattern, and we need to separate that
 abstraction to allow cgroup to be created and then later joined.
 * The existing `ContainerManager` should separate node bootstrapping into a separate
 `NodeManager` that is dependent on the configured `cgroup-driver`.
 * The `kubelet` flags for cgroup paths will convert internally as part of cgroup library,
 i.e. `/foo/bar` will just convert to `foo-bar.slice`
 ### kubelet accounting for end-user pods
 This proposal re-enforces that it is inappropriate at this time to depend on `--cgroup-root` as the
 primary mechanism to distinguish and account for end-user pod compute resource usage.
 Instead, the `kubelet` can and should sum the usage of each running `pod` on the node to account for
 end-user pod usage separate from system-reserved and kubernetes-reserved accounting via `cAdvisor`.
 ## Known issues
 ### Docker runtime support for --cgroup-parent
 Docker versions <= 1.0.9 did not have proper support for `-cgroup-parent` flag on `systemd`.  This
 was fixed in this PR (https://github.com/docker/docker/pull/18612).  As result, it's expected
 that containers launched by the `docker` daemon may continue to go in the default `system.slice` and
 appear to be counted under system-reserved node usage accounting.
 If operators run with later versions of `docker`, they can avoid this issue via the use of `cgroup-root`
 flag on the `kubelet`, but this proposal makes no requirement on operators to do that at this time, and
 this can be revisited if/when the project adopts docker 1.10.
 Some OS distributions will fix this bug in versions of docker <= 1.0.9, so operators should
 be aware of how their version of `docker` was packaged when using this feature.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-systemd.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubelet-tls-bootstrap.md
+++ b/docs/proposals/kubelet-tls-bootstrap.md
@ -1,243 +1 @@
-# Kubelet TLS bootstrap
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md)
 Author: George Tankersley (george.tankersley@coreos.com)
 ## Preface
 This document describes a method for a kubelet to bootstrap itself
 into a TLS-secured cluster. Crucially, it automates the provision and
 distribution of signed certificates.
 ## Overview
 When a kubelet runs for the first time, it must be given TLS assets
 or generate them itself. In the first case, this is a burden on the cluster
 admin and a significant logistical barrier to secure Kubernetes rollouts. In
 the second, the kubelet must self-sign its certificate and forfeits many of the
 advantages of a PKI system. Instead, we propose that the kubelet generate a
 private key and a CSR for submission to a cluster-level certificate signing
 process.
 ## Preliminaries
 We assume the existence of a functioning control plane. The
 apiserver should be configured for TLS initially or possess the ability to
 generate valid TLS credentials for itself. If secret information is passed in
 the request (e.g. auth tokens supplied with the request or included in
 ExtraInfo) then all communications from the node to the apiserver must take
 place over a verified TLS connection.
 Each node is additionally provisioned with the following information:
 1. Location of the apiserver
 2. Any CA certificates necessary to trust the apiserver's TLS certificate
 3. Access tokens (if needed) to communicate with the CSR endpoint
 These should not change often and are thus simple to include in a static
 provisioning script.
 ## API Changes
 ### CertificateSigningRequest Object
 We introduce a new API object to represent PKCS#10 certificate signing
 requests. It will be accessible under:
 `/apis/certificates/v1beta1/certificatesigningrequests/mycsr`
 It will have the following structure:
 ```go
 // Describes a certificate signing request
 type CertificateSigningRequest struct {
 	unversioned.TypeMeta `json:",inline"`
 	api.ObjectMeta       `json:"metadata,omitempty"`
 	// The certificate request itself and any additional information.
 	Spec CertificateSigningRequestSpec `json:"spec,omitempty"`
 	// Derived information about the request.
 	Status CertificateSigningRequestStatus `json:"status,omitempty"`
 }
 // This information is immutable after the request is created.
 type CertificateSigningRequestSpec struct {
 	// Base64-encoded PKCS#10 CSR data
 	Request string `json:"request"`
 	// Any extra information the node wishes to send with the request.
 	ExtraInfo []string `json:"extrainfo,omitempty"`
 }
 // This information is derived from the request by Kubernetes and cannot be
 // modified by users. All information is optional since it might not be
 // available in the underlying request. This is intended to aid approval
 // decisions.
 type CertificateSigningRequestStatus struct {
 	// Information about the requesting user (if relevant)
 	// See user.Info interface for details
 	Username string   `json:"username,omitempty"`
 	UID      string   `json:"uid,omitempty"`
 	Groups   []string `json:"groups,omitempty"`
 	// Fingerprint of the public key in request
 	Fingerprint string `json:"fingerprint,omitempty"`
 	// Subject fields from the request
 	Subject internal.Subject `json:"subject,omitempty"`
 	// DNS SANs from the request
 	Hostnames []string `json:"hostnames,omitempty"`
 	// IP SANs from the request
 	IPAddresses []string `json:"ipaddresses,omitempty"`
 	Conditions []CertificateSigningRequestCondition `json:"conditions,omitempty"`
 }
 type RequestConditionType string
 // These are the possible states for a certificate request.
 const (
 	Approved RequestConditionType = "Approved"
 	Denied   RequestConditionType = "Denied"
 )
 type CertificateSigningRequestCondition struct {
 	// request approval state, currently Approved or Denied.
 	Type RequestConditionType `json:"type"`
 	// brief reason for the request state
 	Reason string `json:"reason,omitempty"`
 	// human readable message with details about the request state
 	Message string `json:"message,omitempty"`
 	// If request was approved, the controller will place the issued certificate here.
 	Certificate []byte `json:"certificate,omitempty"`
 }
 type CertificateSigningRequestList struct {
 	unversioned.TypeMeta `json:",inline"`
 	unversioned.ListMeta `json:"metadata,omitempty"`
 	Items []CertificateSigningRequest `json:"items,omitempty"`
 }
 ```
 We also introduce CertificateSigningRequestList to allow listing all the CSRs in the cluster:
 ```go
 type CertificateSigningRequestList struct {
        api.TypeMeta
        api.ListMeta
        Items []CertificateSigningRequest
 }
 ```
 ## Certificate Request Process
 ### Node intialization
 When the kubelet executes it checks a location on disk for TLS assets
 (currently `/var/run/kubernetes/kubelet.{key,crt}` by default). If it finds
 them, it proceeds. If there are no TLS assets, the kubelet generates a keypair
 and self-signed certificate. We propose the following optional behavior:
 1. Generate a keypair
 2. Generate a CSR for that keypair with CN set to the hostname (or
   `--hostname-override` value) and DNS/IP SANs supplied with whatever values
   the host knows for itself.
 3. Post the CSR to the CSR API endpoint.
 4. Set a watch on the CSR object to be notified of approval or rejection.
 ### Controller response
 The apiserver persists the CertificateSigningRequests and exposes the List of
 all CSRs for an administrator to approve or reject.
 A new certificate controller watches for certificate requests. It must first
 validate the signature on each CSR and add `Condition=Denied` on
 any requests with invalid signatures (with Reason and Message incidicating
 such). For valid requests, the controller will derive the information in
 `CertificateSigningRequestStatus` and update that object. The controller should
 watch for updates to the approval condition of any CertificateSigningRequest.
 When a request is approved (signified by Conditions containing only Approved)
 the controller should generate and sign a certificate based on that CSR, then
 update the condition with the certificate data using the `/approval`
 subresource.
 ### Manual CSR approval
 An administrator using `kubectl` or another API client can query the
 CertificateSigningRequestList and update the approval condition of
 CertificateSigningRequests. The default state is empty, indicating that there
 has been no decision so far. A state of "Approved" indicates that the admin has
 approved the request and the certificate controller should issue the
 certificate. A state of "Denied" indicates that admin has denied the
 request. An admin may also supply Reason and Message fields to explain the
 rejection.
 ## kube-apiserver support
 The apiserver will present the new endpoints mentioned above and support the
 relevant object types.
 ## kube-controller-manager support
 To handle certificate issuance, the controller-manager will need access to CA
 signing assets. This could be as simple as a private key and a config file or
 as complex as a PKCS#11 client and supplementary policy system. For now, we
 will add flags for a signing key, a certificate, and a basic policy file.
 ## kubectl support
 To support manual CSR inspection and approval, we will add support for listing,
 inspecting, and approving or denying CertificateSigningRequests to kubectl. The
 interaction will be similar to
 [salt-key](https://docs.saltstack.com/en/latest/ref/cli/salt-key.html).
 Specifically, the admin will have the ability to retrieve the full list of
 pending CSRs, inspect their contents, and set their approval conditions to one
 of:
 1. **Approved** if the controller should issue the cert
 2. **Denied** if the controller should not issue the cert
 The suggested command for listing is `kubectl get csrs`. The approve/deny
 interactions can be accomplished with normal updates, but would be more
 conveniently accessed by direct subresource updates. We leave this for future
 updates to kubectl.
 ## Security Considerations
 ### Endpoint Access Control
 The ability to post CSRs to the signing endpoint should be controlled. As a
 simple solution we propose that each node be provisioned with an auth token
 (possibly static across the cluster) that is scoped via ABAC to only allow
 access to the CSR endpoint.
 ### Expiration & Revocation
 The node is responsible for monitoring its own certificate expiration date.
 When the certificate is close to expiration, the kubelet should begin repeating
 this flow until it successfully obtains a new certificate. If the expiring
 certificate has not been revoked and the previous certificate request is still
 approved, then it may do so using the same keypair unless the cluster policy
 (see "Future Work") requires fresh keys.
 Revocation is for the most part an unhandled problem in Go, requiring each
 application to produce its own logic around a variety of parsing functions. For
 now, our suggested best practice is to issue only short-lived certificates. In
 the future it may make sense to add CRL support to the apiserver's client cert
 auth.
 ## Future Work
 - revocation UI in kubectl and CRL support at the apiserver
 - supplemental policy (e.g. cluster CA only issues 30-day certs for hostnames *.k8s.example.com, each new cert must have fresh keys, ...)
 - fully automated provisioning (using a handshake protocol or external list of authorized machines)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubelet-tls-bootstrap.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/kubemark.md
+++ b/docs/proposals/kubemark.md
@ -1,157 +1 @@
-# Kubemark proposal
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubemark.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubemark.md)
 ## Goal of this document
 This document describes a design of Kubemark - a system that allows performance testing of a Kubernetes cluster. It describes the
 assumption, high level design and discusses possible solutions for lower-level problems. It is supposed to be a starting point for more
 detailed discussion.
 ## Current state and objective
 Currently performance testing happens on ‘live’ clusters of up to 100 Nodes. It takes quite a while to start such cluster or to push
 updates to all Nodes, and it uses quite a lot of resources. At this scale the amount of wasted time and used resources is still acceptable.
 In the next quarter or two we’re targeting 1000 Node cluster, which will push it way beyond ‘acceptable’ level. Additionally we want to
 enable people without many resources to run scalability tests on bigger clusters than they can afford at given time. Having an ability to
 cheaply run scalability tests will enable us to run some set of them on "normal" test clusters, which in turn would mean ability to run
 them on every PR.
 This means that we need a system that will allow for realistic performance testing on (much) smaller number of “real” machines. First
 assumption we make is that Nodes are independent, i.e. number of existing Nodes do not impact performance of a single Node. This is not
 entirely true, as number of Nodes can increase latency of various components on Master machine, which in turn may increase latency of Node
 operations, but we’re not interested in measuring this effect here. Instead we want to measure how number of Nodes and the load imposed by
 Node daemons affects the performance of Master components.
 ## Kubemark architecture overview
 The high-level idea behind Kubemark is to write library that allows running artificial "Hollow" Nodes that will be able to simulate a
 behavior of real Kubelet and KubeProxy in a single, lightweight binary. Hollow components will need to correctly respond to Controllers
 (via API server), and preferably, in the fullness of time, be able to ‘replay’ previously recorded real traffic (this is out of scope for
 initial version). To teach Hollow components replaying recorded traffic they will need to store data specifying when given Pod/Container
 should die (e.g. observed lifetime). Such data can be extracted e.g. from etcd Raft logs, or it can be reconstructed from Events. In the
 initial version we only want them to be able to fool Master components and put some configurable (in what way TBD) load on them.
 When we have Hollow Node ready, we’ll be able to test performance of Master Components by creating a real Master Node, with API server,
 Controllers, etcd and whatnot, and create number of Hollow Nodes that will register to the running Master.
 To make Kubemark easier to maintain when system evolves Hollow components will reuse real "production" code for Kubelet and KubeProxy, but
 will mock all the backends with no-op or very simple mocks. We believe that this approach is better in the long run than writing special
 "performance-test-aimed" separate version of them. This may take more time to create an initial version, but we think maintenance cost will
 be noticeably smaller.
 ### Option 1
 For the initial version we will teach Master components to use port number to identify Kubelet/KubeProxy. This will allow running those
 components on non-default ports, and in the same time will allow to run multiple Hollow Nodes on a single machine. During setup we will
 generate credentials for cluster communication and pass them to HollowKubelet/HollowProxy to use. Master will treat all HollowNodes as
 normal ones.
 ![Kubmark architecture diagram for option 1](Kubemark_architecture.png?raw=true "Kubemark architecture overview")
 *Kubmark architecture diagram for option 1*
 ### Option 2
 As a second (equivalent) option we will run Kubemark on top of 'real' Kubernetes cluster, where both Master and Hollow Nodes will be Pods.
 In this option we'll be able to use Kubernetes mechanisms to streamline setup, e.g. by using Kubernetes networking to ensure unique IPs for
 Hollow Nodes, or using Secrets to distribute Kubelet credentials. The downside of this configuration is that it's likely that some noise
 will appear in Kubemark results from either CPU/Memory pressure from other things running on Nodes (e.g. FluentD, or Kubelet) or running
 cluster over an overlay network. We believe that it'll be possible to turn off cluster monitoring for Kubemark runs, so that the impact
 of real Node daemons will be minimized, but we don't know what will be the impact of using higher level networking stack. Running a
 comparison will be an interesting test in itself.
 ### Discussion
 Before taking a closer look at steps necessary to set up a minimal Hollow cluster it's hard to tell which approach will be simpler. It's
 quite possible that the initial version will end up as hybrid between running the Hollow cluster directly on top of VMs and running the
 Hollow cluster on top of a Kubernetes cluster that is running on top of VMs. E.g. running Nodes as Pods in Kubernetes cluster and Master
 directly on top of VM.
 ## Things to simulate
 In real Kubernetes on a single Node we run two daemons that communicate with Master in some way: Kubelet and KubeProxy.
 ### KubeProxy
 As a replacement for KubeProxy we'll use HollowProxy, which will be a real KubeProxy with injected no-op mocks everywhere it makes sense.
 ### Kubelet
 As a replacement for Kubelet we'll use HollowKubelet, which will be a real Kubelet with injected no-op or simple mocks everywhere it makes
 sense.
 Kubelet also exposes cadvisor endpoint which is scraped by Heapster, healthz to be read by supervisord, and we have FluentD running as a
 Pod on each Node that exports logs to Elasticsearch (or Google Cloud Logging). Both Heapster and Elasticsearch are running in Pods in the
 cluster so do not add any load on a Master components by themselves. There can be other systems that scrape Heapster through proxy running
 on Master, which adds additional load, but they're not the part of default setup, so in the first version we won't simulate this behavior.
 In the first version we’ll assume that all started Pods will run indefinitely if not explicitly deleted. In the future we can add a model
 of short-running batch jobs, but in the initial version we’ll assume only serving-like Pods.
 ### Heapster
 In addition to system components we run Heapster as a part of cluster monitoring setup. Heapster currently watches Events, Pods and Nodes
 through the API server. In the test setup we can use real heapster for watching API server, with mocked out piece that scrapes cAdvisor
 data from Kubelets.
 ### Elasticsearch and Fluentd
 Similarly to Heapster Elasticsearch runs outside the Master machine but generates some traffic on it. Fluentd “daemon” running on Master
 periodically sends Docker logs it gathered to the Elasticsearch running on one of the Nodes. In the initial version we omit Elasticsearch,
 as it produces only a constant small load on Master Node that does not change with the size of the cluster.
 ## Necessary work
 There are three more or less independent things that needs to be worked on:
 - HollowNode implementation, creating a library/binary that will be able to listen to Watches and respond in a correct fashion with Status
 updates. This also involves creation of a CloudProvider that can produce such Hollow Nodes, or making sure that HollowNodes can correctly
 self-register in no-provider Master.
 - Kubemark setup, including figuring networking model, number of Hollow Nodes that will be allowed to run on a single “machine”, writing
 setup/run/teardown scripts (in [option 1](#option-1)), or figuring out how to run Master and Hollow Nodes on top of Kubernetes
 (in [option 2](#option-2))
 - Creating a Player component that will send requests to the API server putting a load on a cluster. This involves creating a way to
 specify desired workload. This task is
 very well isolated from the rest, as it is about sending requests to the real API server. Because of that we can discuss requirements
 separately.
 ## Concerns
 Network performance most likely won't be a problem for the initial version if running on directly on VMs rather than on top of a Kubernetes
 cluster, as Kubemark will be running on standard networking stack (no cloud-provider software routes, or overlay network is needed, as we
 don't need custom routing between Pods). Similarly we don't think that running Kubemark on Kubernetes virtualized cluster networking will
 cause noticeable performance impact, but it requires testing.
 On the other hand when adding additional features it may turn out that we need to simulate Kubernetes Pod network. In such, when running
 'pure' Kubemark we may try one of the following:
  - running overlay network like Flannel or OVS instead of using cloud providers routes,
  - write simple network multiplexer to multiplex communications from the Hollow Kubelets/KubeProxies on the machine.
 In case of Kubemark on Kubernetes it may turn that we run into a problem with adding yet another layer of network virtualization, but we
 don't need to solve this problem now.
 ## Work plan
 - Teach/make sure that Master can talk to multiple Kubelets on the same Machine [option 1](#option-1):
  - make sure that Master can talk to a Kubelet on non-default port,
  - make sure that Master can talk to all Kubelets on different ports,
 - Write HollowNode library:
  - new HollowProxy,
  - new HollowKubelet,
  - new HollowNode combining the two,
  - make sure that Master can talk to two HollowKubelets running on the same machine
 - Make sure that we can run Hollow cluster on top of Kubernetes [option 2](#option-2)
 - Write a player that will automatically put some predefined load on Master, <- this is the moment when it’s possible to play with it and is useful by itself for
 scalability tests. Alternatively we can just use current density/load tests,
 - Benchmark our machines - see how many Watch clients we can have before everything explodes,
 - See how many HollowNodes we can run on a single machine by attaching them to the real master <- this is the moment it starts to useful
 - Update kube-up/kube-down scripts to enable creating “HollowClusters”/write a new scripts/something, integrate HollowCluster with a Elasticsearch/Heapster equivalents,
 - Allow passing custom configuration to the Player
 ## Future work
 In the future we want to add following capabilities to the Kubemark system:
 - replaying real traffic reconstructed from the recorded Events stream,
 - simulating scraping things running on Nodes through Master proxy.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubemark.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/local-cluster-ux.md
+++ b/docs/proposals/local-cluster-ux.md
@ -1,161 +1 @@
-# Kubernetes Local Cluster Experience
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/local-cluster-ux.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/local-cluster-ux.md)
 This proposal attempts to improve the existing local cluster experience for kubernetes.
 The current local cluster experience is sub-par and often not functional.
 There are several options to setup a local cluster (docker, vagrant, linux processes, etc) and we do not test any of them continuously.
 Here are some highlighted issues:
 - Docker based solution breaks with docker upgrades, does not support DNS, and many kubelet features are not functional yet inside a container.
 - Vagrant based solution are too heavy and have mostly failed on OS X.
 - Local linux cluster is poorly documented and is undiscoverable.
 From an end user perspective, they want to run a kubernetes cluster. They care less about *how* a cluster is setup locally and more about what they can do with a functional cluster.
 ## Primary Goals
 From a high level the goal is to make it easy for a new user to run a Kubernetes cluster and play with curated examples that require least amount of knowledge about Kubernetes.
 These examples will only use kubectl and only a subset of Kubernetes features that are available will be exposed.
 - Works across multiple OSes - OS X, Linux and Windows primarily.
 - Single command setup and teardown UX.
 - Unified UX across OSes
 - Minimal dependencies on third party software.
 - Minimal resource overhead.
 - Eliminate any other alternatives to local cluster deployment.
 ## Secondary Goals
 - Enable developers to use the local cluster for kubernetes development.
 ## Non Goals
 - Simplifying kubernetes production deployment experience. [Kube-deploy](https://github.com/kubernetes/kube-deploy) is attempting to tackle this problem.
 - Supporting all possible deployment configurations of Kubernetes like various types of storage, networking, etc.
 ## Local cluster requirements
 - Includes all the master components & DNS (Apiserver, scheduler, controller manager, etcd and kube dns)
 - Basic auth
 - Service accounts should be setup
 - Kubectl should be auto-configured to use the local cluster
 - Tested & maintained as part of Kubernetes core
 ## Existing solutions
 Following are some of the existing solutions that attempt to simplify local cluster deployments.
 ### [Spread](https://github.com/redspread/spread)
 Spread's UX is great!
 It is adapted from monokube and includes DNS as well.
 It satisfies almost all the requirements, excepting that of requiring docker to be pre-installed.
 It has a loose dependency on docker.
 New releases of docker might break this setup.
 ### [Kmachine](https://github.com/skippbox/kmachine)
 Kmachine is adapted from docker-machine.
 It exposes the entire docker-machine CLI.
 It is possible to repurpose Kmachine to meet all our requirements.
 ### [Monokube](https://github.com/polvi/monokube)
 Single binary that runs all kube master components.
 Does not include DNS.
 This is only a part of the overall local cluster solution.
 ### Vagrant
 The kube-up.sh script included in Kubernetes release supports a few Vagrant based local cluster deployments.
 kube-up.sh is not user friendly.
 It typically takes a long time for the cluster to be set up using vagrant and often times is unsuccessful on OS X.
 The [Core OS single machine guide](https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant-single.html)  uses Vagrant as well and it just works.
 Since we are targeting a single command install/teardown experience, vagrant needs to be an implementation detail and not be exposed to our users.
 ## Proposed Solution
 To avoid exposing users to third party software and external dependencies, we will build a toolbox that will be shipped with all the dependencies including all kubernetes components, hypervisor, base image, kubectl, etc.
 *Note: Docker provides a [similar toolbox](https://www.docker.com/products/docker-toolbox).*
 This "Localkube" tool will be referred to as "Minikube" in this proposal to avoid ambiguity against Spread's existing ["localkube"](https://github.com/redspread/localkube).
 The final name of this tool is TBD. Suggestions are welcome!
 Minikube will provide a unified CLI to interact with the local cluster.
 The CLI will support only a few operations:
    - **Start** - creates & starts a local cluster along with setting up kubectl & networking (if necessary)
    - **Stop** - suspends the local cluster & preserves cluster state
    - **Delete** - deletes the local cluster completely
    - **Upgrade** - upgrades internal components to the latest available version (upgrades are not guaranteed to preserve cluster state)
 For running and managing the kubernetes components themselves,  we can re-use [Spread's localkube](https://github.com/redspread/localkube).
 Localkube is a self-contained go binary that includes all the master components including DNS and runs them using multiple go threads.
 Each Kubernetes release will include a localkube binary that has been tested exhaustively.
 To support Windows and OS X, minikube will use [libmachine](https://github.com/docker/machine/tree/master/libmachine) internally to create and destroy virtual machines.
 Minikube will be shipped with an hypervisor (virtualbox) in the case of OS X.
 Minikube will include a base image that will be well tested.
 In the case of Linux, since the cluster can be run locally, we ideally want to avoid setting up a VM.
 Since docker is the only fully supported runtime as of Kubernetes v1.2, we can initially use docker to run and manage localkube.
 There is risk of being incompatible with the existing version of docker.
 By using a VM, we can avoid such incompatibility issues though.
 Feedback from the community will be helpful here.
 If the goal is to run outside of a VM, we can have minikube prompt the user if docker is unavailable or version is incompatible.
 Alternatives to docker for running the localkube core includes using [rkt](https://coreos.com/rkt/docs/latest/), setting up systemd services, or a System V Init script depending on the distro.
 To summarize the pipeline is as follows:
 ##### OS X / Windows
 minikube -> libmachine -> virtualbox/hyper V -> linux VM -> localkube
 ##### Linux
 minikube -> docker -> localkube
 ### Alternatives considered
 #### Bring your own docker
 ##### Pros
 - Kubernetes users will probably already have it
 - No extra work for us
 - Only one VM/daemon, we can just reuse the existing one
 ##### Cons
 - Not designed to be wrapped, may be unstable
 - Might make configuring networking difficult on OS X and Windows
 - Versioning and updates will be challenging. We can mitigate some of this with testing at HEAD, but we'll - inevitably hit situations where it's infeasible to work with multiple versions of docker.
 - There are lots of different ways to install docker, networking might be challenging if we try to support many paths.
 #### Vagrant
 ##### Pros
 - We control the entire experience
 - Networking might be easier to build
 - Docker can't break us since we'll include a pinned version of Docker
 - Easier to support rkt or hyper in the future
 - Would let us run some things outside of containers (kubelet, maybe ingress/load balancers)
 ##### Cons
 - More work
 - Extra resources (if the user is also running docker-machine)
 - Confusing if there are two docker daemons (images built in one can't be run in another)
 - Always needs a VM, even on Linux
 - Requires installing and possibly understanding Vagrant.
 ## Releases & Distribution
 - Minikube will be released independent of Kubernetes core in order to facilitate fixing of issues that are outside of Kubernetes core.
 - The latest version of Minikube is guaranteed to support the latest release of Kubernetes, including documentation.
 - The Google Cloud SDK will package minikube and provide utilities for configuring kubectl to use it, but will not in any other way wrap minikube.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/local-cluster-ux.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/multi-platform.md
+++ b/docs/proposals/multi-platform.md
@ -1,532 +1 @@
-# Kubernetes for multiple platforms
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/multi-platform.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/multi-platform.md)
 **Author**: Lucas Käldström ([@luxas](https://github.com/luxas))
 **Status** (25th of August 2016): Some parts are already implemented; but still there quite a lot of work to be done.
 ## Abstract
 We obviously want Kubernetes to run on as many platforms as possible, in order to make Kubernetes a even more powerful system.
 This is a proposal that explains what should be done in order to achieve a true cross-platform container management system.
 Kubernetes is written in Go, and Go code is portable across platforms.
 Docker and rkt are also written in Go, and it's already possible to use them on various platforms.
 When it's possible to run containers on a specific architecture, people also want to use Kubernetes to manage the containers.
 In this proposal, a `platform` is defined as `operating system/architecture` or `${GOOS}/${GOARCH}` in Go terms.
 The following platforms are proposed to be built for in a Kubernetes release:
 - linux/amd64
 - linux/arm (GOARM=6 initially, but we probably have to bump this to GOARM=7 due to that the most of other ARM things are ARMv7)
 - linux/arm64
 - linux/ppc64le
 If there's interest in running Kubernetes on `linux/s390x` too, it won't require many changes to the source now when we've laid the ground for a multi-platform Kubernetes already.
 There is also work going on with porting Kubernetes to Windows (`windows/amd64`). See [this issue](https://github.com/kubernetes/kubernetes/issues/22623) for more details.
 But note that when porting to a new OS like windows, a lot of os-specific changes have to be implemented before cross-compiling, releasing and other concerns this document describes may apply.
 ## Motivation
 Then the question probably is: Why?
 In fact, making it possible to run Kubernetes on other platforms will enable people to create customized and highly-optimized solutions that exactly fits their hardware needs.
 Example: [Paypal validates arm64 for real-time data analysis](http://www.datacenterdynamics.com/content-tracks/servers-storage/paypal-successfully-tests-arm-based-servers/93835.fullarticle)
 Also, by including other platforms to the Kubernetes party a healthy competition between platforms can/will take place.
 Every platform obviously has both pros and cons. By adding the option to make clusters of mixed platforms, the end user may take advantage of the good sides of every platform.
 ## Use Cases
 For a large enterprise where computing power is the king, one may imagine the following combinations:
 - `linux/amd64`: For running most of the general-purpose computing tasks, cluster addons, etc.
 - `linux/ppc64le`: For running highly-optimized software; especially massive compute tasks
 - `windows/amd64`: For running services that are only compatible on windows; e.g. business applications written in C# .NET
 For a mid-sized business where efficiency is most important, these could be combinations:
 - `linux/amd64`: For running most of the general-purpose computing tasks, plus tasks that require very high single-core performance.
 - `linux/arm64`: For running webservices and high-density tasks => the cluster could autoscale in a way that `linux/amd64` machines could hibernate at night in order to minimize power usage.
 For a small business or university, arm is often sufficient:
 - `linux/arm`: Draws very little power, and can run web sites and app backends efficiently on Scaleway for example.
 And last but not least; Raspberry Pi's should be used for [education at universities](http://kubecloud.io/) and are great for **demoing Kubernetes' features at conferences.**
 ## Main proposal
 ### Release binaries for all platforms
 First and foremost, binaries have to be released for all platforms.
 This affects the build-release tools. Fortunately, this is quite straightforward to implement, once you understand how Go cross-compilation works.
 Since Kubernetes' release and build jobs run on `linux/amd64`, binaries have to be cross-compiled and Docker images should be cross-built.
 Builds should be run in a Docker container in order to get reproducible builds; and `gcc` should be installed for all platforms inside that image (`kube-cross`)
 All released binaries should be uploaded to `https://storage.googleapis.com/kubernetes-release/release/${version}/bin/${os}/${arch}/${binary}`
 This is a fairly long topic. If you're interested how to cross-compile, see [details about cross-compilation](#cross-compilation-details)
 ### Support all platforms in a "run everywhere" deployment
 The easiest way of running Kubernetes on another architecture at the time of writing is probably by using the docker-multinode deployment. Of course, you may choose whatever deployment you want, the binaries are easily downloadable from the URL above.
 [docker-multinode](https://github.com/kubernetes/kube-deploy/tree/master/docker-multinode) is intended to be a "kick-the-tires" multi-platform solution with Docker as the only real dependency (but it's not production ready)
 But when we (`sig-cluster-lifecycle`) have standardized the deployments to about three and made them production ready; at least one deployment should support **all platforms**.
 ### Set up a build and e2e CI's
 #### Build CI
 Kubernetes should always enforce that all binaries are compiling.
 **On every PR, `make release` have to be run** in order to require the code proposed to be merged to be compatible for all architectures.
 For more information, see [conflicts](#conflicts)
 #### e2e CI
 To ensure all functionality really is working on all other platforms, the community should be able to setup a CI.
 To be able to do that, all the test-specific images have to be ported to multiple architectures, and the test images should preferably be manifest lists.
 If the test images aren't manifest lists, the test code should automatically choose the right image based on the image naming.
 IBM volunteered to run continuously running e2e tests for `linux/ppc64le`.
 Still it's hard to set up a such CI (even on `linux/amd64`), but that work belongs to `kubernetes/test-infra` proposals.
 When it's possible to test Kubernetes using Kubernetes; volunteers should be given access to publish their results on `k8s-testgrid.appspot.com`.
 ### Official support level
 When all e2e tests are passing for a given platform; the platform should be officially supported by the Kubernetes team.
 At the time of writing, `amd64` is in the officially supported category category.
 When a platform is building and it's possible to set up a cluster with the core functionality, the platform is supported on a "best-effort" and experimental basis.
 At the time of writing, `arm`, `arm64` and `ppc64le` are in the experimental category; the e2e tests aren't cross-platform yet.
 ### Docker image naming and manifest lists
 #### Docker manifest lists
 Here's a good article about how the "manifest list" in the Docker image [manifest spec v2](https://github.com/docker/distribution/pull/1068) works: [A step towards multi-platform Docker images](https://integratedcode.us/2016/04/22/a-step-towards-multi-platform-docker-images/)
 A short summary: A manifest list is a list of Docker images with a single name (e.g. `busybox`), that holds layers for multiple platforms _when it's stored in a registry_.
 When the image is pulled by a client (`docker pull busybox`), only layers for the target platforms are downloaded.
 Right now we have to write `busybox-${ARCH}` for example instead, but that leads to extra scripting and unnecessary logic.
 For reference see [docker/docker#24739](https://github.com/docker/docker/issues/24739) and [appc/docker2aci#193](https://github.com/appc/docker2aci/issues/193)
 #### Image naming
 This has been debated quite a lot about; how we should name non-amd64 docker images that are pushed to `gcr.io`. See [#23059](https://github.com/kubernetes/kubernetes/pull/23059) and [#23009](https://github.com/kubernetes/kubernetes/pull/23009).
 This means that the naming `gcr.io/google_containers/${binary}:${version}` should contain a _manifest list_ for future tags.
 The manifest list thereby becomes a wrapper that is pointing to the `-${arch}` images.
 This requires `docker-1.10` or newer, which probably means Kubernetes v1.4 and higher.
 TL;DR;
 - `${binary}-${arch}:${version}` images should be pushed for all platforms
 - `${binary}:${version}` images should point to the `-${arch}`-specific ones, and docker will then download the right image.
 ### Components should expose their platform
 It should be possible to run clusters with mixed platforms smoothly. After all, bringing heterogenous machines together to a single unit (a cluster) is one of Kubernetes' greatest strengths. And since the Kubernetes' components communicate over HTTP, two binaries of different architectures may talk to each other normally.
 The crucial thing here is that the components that handle platform-specific tasks (e.g. kubelet) should expose their platform. In the kubelet case, we've initially solved it by exposing the labels `beta.kubernetes.io/{os,arch}` on every node. This way an user may run binaries for different platforms on a multi-platform cluster, but still it requires manual work to apply the label to every manifest.
 Also, [the apiserver now exposes](https://github.com/kubernetes/kubernetes/pull/19905) it's platform at `GET /version`. But note that the value exposed at `/version` only is the apiserver's platform; there might be kubelets of various other platforms.
 ### Standardize all image Makefiles to follow the same pattern
 All Makefiles should push for all platforms when doing `make push`, and build for all platforms when doing `make build`.
 Under the hood; they should compile binaries in a container for reproducability, and use QEMU for emulating Dockerfile `RUN` commands if necessary.
 ### Remove linux/amd64 hard-codings from the codebase
 All places where `linux/amd64` is hardcoded in the codebase should be rewritten.
 #### Make kubelet automatically use the right pause image
 The `pause` is used for connecting containers into Pods. It's a binary that just sleeps forever.
 When Kubernetes starts up a Pod, it first starts a `pause` container, and let's all "real" containers join the same network by setting `--net=${pause_container_id}`.
 So in order to start Kubernetes Pods on any other architecture, an ever-sleeping image have to exist.
 Fortunately, `kubelet` has the `--pod-infra-container-image` option, and it has been used when running Kubernetes on other platforms.
 But relying on the deployment setup to specify the right image for the platform isn't great, the kubelet should be smarter than that.
 This specific problem has been fixed in [#23059](https://github.com/kubernetes/kubernetes/pull/23059).
 #### Vendored packages
 Here are two common problems that a vendored package might have when trying to add/update it:
 - Including constants combined with build tags
 ```go
 //+ build linux,amd64
 const AnAmd64OnlyConstant = 123
 ```
 - Relying on platform-specific syscalls (e.g. `syscall.Dup2`)
 If someone tries to add a dependency that doesn't satisfy these requirements; the CI will catch it and block the PR until the author has updated the vendored repo and fixed the problem.
 ### kubectl should be released for all platforms that are relevant
 kubectl is released for more platforms than the proposed server platforms, if you want to check out an up-to-date list of them, [see here](../../hack/lib/golang.sh).
 kubectl is trivial to cross-compile, so if there's interest in adding a new platform for it, it may be as easy as appending the platform to the list linked above.
 ### Addons
 Addons like dns, heapster and ingress play a big role in a working Kubernetes cluster, and we should aim to be able to deploy these addons on multiple platforms too.
 `kube-dns`, `dashboard` and `addon-manager` are the most important images, and they are already ported for multiple platforms.
 These addons should also be converted to multiple platforms:
 - heapster, influxdb + grafana
 - nginx-ingress
 - elasticsearch, fluentd + kibana
 - registry
 ### Conflicts
 What should we do if there's a conflict between keeping e.g. `linux/ppc64le` builds vs. merging a release blocker?
 In fact, we faced this problem while this proposal was being written; in [#25243](https://github.com/kubernetes/kubernetes/pull/25243). It is quite obvious that the release blocker is of higher priority.
 However, before temporarily [deactivating builds](https://github.com/kubernetes/kubernetes/commit/2c9b83f291e3e506acc3c08cd10652c255f86f79), the author of the breaking PR should first try to fix the problem. If it turns out being really hard to solve, builds for the affected platform may be deactivated and a P1 issue should be made to activate them again.
 ## Cross-compilation details (for reference)
 ### Go language details
 Go 1.5 introduced many changes. To name a few that are relevant to Kubernetes:
 - C was eliminated from the tree (it was earlier used for the bootstrap runtime).
 - All processors are used by default, which means we should be able to remove [lines like this one](https://github.com/kubernetes/kubernetes/blob/v1.2.0/cmd/kubelet/kubelet.go#L37)
 - The garbage collector became more efficent (but also [confused our latency test](https://github.com/golang/go/issues/14396)).
 - `linux/arm64` and `linux/ppc64le` were added as new ports.
 - The `GO15VENDOREXPERIMENT` was started. We switched from `Godeps/_workspace` to the native `vendor/` in [this PR](https://github.com/kubernetes/kubernetes/pull/24242).
 - It's not required to pre-build the whole standard library `std` when cross-compliling. [Details](#prebuilding-the-standard-library-std)
 - Builds are approximately twice as slow as earlier. That affects the CI. [Details](#releasing)
 - The native Go DNS resolver will suffice in the most situations. This makes static linking much easier.
 All release notes for Go 1.5 [are here](https://golang.org/doc/go1.5)
 Go 1.6 didn't introduce as many changes as Go 1.5 did, but here are some of note:
 - It should perform a little bit better than Go 1.5.
 - `linux/mips64` and `linux/mips64le` were added as new ports.
 - Go < 1.6.2 for `ppc64le` had [bugs in it](https://github.com/kubernetes/kubernetes/issues/24922).
 All release notes for Go 1.6 [are here](https://golang.org/doc/go1.6)
 In Kubernetes 1.2, the only supported Go version was `1.4.2`, so `linux/arm` was the only possible extra architecture: [#19769](https://github.com/kubernetes/kubernetes/pull/19769).
 In Kubernetes 1.3, [we upgraded to Go 1.6](https://github.com/kubernetes/kubernetes/pull/22149), which made it possible to build Kubernetes for even more architectures [#23931](https://github.com/kubernetes/kubernetes/pull/23931).
 #### The `sync/atomic` bug on 32-bit platforms
 From https://golang.org/pkg/sync/atomic/#pkg-note-BUG:
 > On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned.
 `etcd` have had [issues](https://github.com/coreos/etcd/issues/2308) with this. See [how to fix it here](https://github.com/coreos/etcd/pull/3249)
 ```go
 // 32-bit-atomic-bug.go
 package main
 import "sync/atomic"
 type a struct {
    b chan struct{}
    c int64
 }
 func main(){
    d := a{}
    atomic.StoreInt64(&d.c, 10 * 1000 * 1000 * 1000)
 }
 ```
 ```console
 $ GOARCH=386 go build 32-bit-atomic-bug.go
 $ file 32-bit-atomic-bug
 32-bit-atomic-bug: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
 $ ./32-bit-atomic-bug
 panic: runtime error: invalid memory address or nil pointer dereference
 [signal 0xb code=0x1 addr=0x0 pc=0x808cd9b]
 goroutine 1 [running]:
 panic(0x8098de0, 0x1830a038)
  /usr/local/go/src/runtime/panic.go:481 +0x326
 sync/atomic.StoreUint64(0x1830e0f4, 0x540be400, 0x2)
  /usr/local/go/src/sync/atomic/asm_386.s:190 +0xb
 main.main()
  /tmp/32-bit-atomic-bug.go:11 +0x4b
 ```
 This means that all structs should keep all `int64` and `uint64` fields at the top of the struct to be safe. If we would move `a.c` to the top of the `a` struct above, the operation would succeed.
 The bug affects `32-bit` platforms when a `(u)int64` field is accessed by an `atomic` method.
 It would be great to write a tool that checks so all `atomic` accessed fields are aligned at the top of the struct, but it's hard: [coreos/etcd#5027](https://github.com/coreos/etcd/issues/5027).
 ## Prebuilding the Go standard library (`std`)
 A great blog post [that is describing this](https://medium.com/@rakyll/go-1-5-cross-compilation-488092ba44ec#.5jcd0owem)
 Before Go 1.5, the whole Go project had to be cross-compiled from source for **all** platforms that _might_ be used, and that was quite a slow process:
 ```console
 # From build-tools/build-image/cross/Dockerfile when we used Go 1.4
 $ cd /usr/src/go/src
 $ for platform in ${PLATFORMS}; do GOOS=${platform%/*} GOARCH=${platform##*/} ./make.bash --no-clean; done
 ```
 With Go 1.5+, cross-compiling the Go repository isn't required anymore. Go will automatically cross-compile the `std` packages that are being used by the code that is being compiled, _and throw it away after the compilation_.
 If you cross-compile multiple times, Go will build parts of `std`, throw it away, compile parts of it again, throw that away and so on.
 However, there is an easy way of cross-compiling all `std` packages in advance with Go 1.5+:
 ```console
 # From build-tools/build-image/cross/Dockerfile when we're using Go 1.5+
 $ for platform in ${PLATFORMS}; do GOOS=${platform%/*} GOARCH=${platform##*/} go install std; done
 ```
 ### Static cross-compilation
 Static compilation with Go 1.5+ is dead easy:
 ```go
 // main.go
 package main
 import "fmt"
 func main() {
    fmt.Println("Hello Kubernetes!")
 }
 ```
 ```console
 $ go build main.go
 $ file main
 main: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
 $ GOOS=linux GOARCH=arm go build main.go
 $ file main
 main: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
 ```
 The only thing you have to do is change the `GOARCH` and `GOOS` variables. Here's a list of valid values for [GOOS/GOARCH](https://golang.org/doc/install/source#environment)
 #### Static compilation with `net`
 Consider this:
 ```go
 // main-with-net.go
 package main
 import "net"
 import "fmt"
 func main() {             
 	fmt.Println(net.ParseIP("10.0.0.10").String())
 }
 ```
 ```console
 $ go build main-with-net.go
 $ file main-with-net
 main-with-net: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, 
    interpreter /lib64/ld-linux-x86-64.so.2, not stripped
 $ GOOS=linux GOARCH=arm go build main-with-net.go
 $ file main-with-net
 main-with-net: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
 ```
 Wait, what? Just because we included `net` from the `std` package, the binary defaults to being dynamically linked when the target platform equals to the host platform?
 Let's take a look at `go env` to get a clue why this happens:
 ```console
 $ go env
 GOARCH="amd64"
 GOHOSTARCH="amd64"
 GOHOSTOS="linux"
 GOOS="linux"
 GOPATH="/go"
 GOROOT="/usr/local/go"
 GO15VENDOREXPERIMENT="1"
 CC="gcc"
 CXX="g++"
 CGO_ENABLED="1"
 ```
 See the `CGO_ENABLED=1` at the end? That's where compilation for the host and cross-compilation differs. By default, Go will link statically if no `cgo` code is involved. `net` is one of the packages that prefers `cgo`, but doesn't depend on it.
 When cross-compiling on the other hand, `CGO_ENABLED` is set to `0` by default.
 To always be safe, run this when compiling statically:
 ```console
 $ CGO_ENABLED=0 go build -a -installsuffix cgo main-with-net.go
 $ file main-with-net
 main-with-net: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
 ```
 See [golang/go#9344](https://github.com/golang/go/issues/9344) for more details.
 ### Dynamic cross-compilation
 In order to dynamically compile a go binary with `cgo`, we need `gcc` installed at build time.
 The only Kubernetes binary that is using C code is the `kubelet`, or in fact `cAdvisor` on which `kubelet` depends. `hyperkube` is also dynamically linked as long as `kubelet` is. We should aim to make `kubelet` statically linked.
 The normal `x86_64-linux-gnu` can't cross-compile binaries, so we have to install gcc cross-compilers for every platform. We do this in the [`kube-cross`](../../build-tools/build-image/cross/Dockerfile) image,
 and depend on the [`emdebian.org` repository](https://wiki.debian.org/CrossToolchains). Depending on `emdebian` isn't ideal, so we should consider using the latest `gcc` cross-compiler packages from the `ubuntu` main repositories in the future.
 Here's an example when cross-compiling plain C code:
 ```c
 // main.c
 #include <stdio.h>
 main()
 {
  printf("Hello Kubernetes!\n");
 }
 ```
 ```console
 $ arm-linux-gnueabi-gcc -o main-c main.c
 $ file main-c
 main-c: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, 
    interpreter /lib/ld-linux.so.3, for GNU/Linux 2.6.32, not stripped
 ```
 And here's an example when cross-compiling `go` and `c`:
 ```go
 // main-cgo.go
 package main
 /*
 char* sayhello(void) { return "Hello Kubernetes!"; }
 */
 import "C"
 import "fmt"
 func main() {
 	fmt.Println(C.GoString(C.sayhello()))
 }
 ```
 ```console
 $ CGO_ENABLED=1 CC=arm-linux-gnueabi-gcc GOOS=linux GOARCH=arm go build main-cgo.go
 $ file main-cgo
 ./main-cgo: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, 
    interpreter /lib/ld-linux.so.3, for GNU/Linux 2.6.32, not stripped
 ```
 The bad thing with dynamic compilation is that it adds an unnecessary dependency on `glibc` _at runtime_.
 ### Static compilation with CGO code
 Lastly, it's even possible to cross-compile `cgo` code _statically_:
 ```console
 $ CGO_ENABLED=1 CC=arm-linux-gnueabi-gcc GOARCH=arm go build -ldflags '-extldflags "-static"' main-cgo.go
 $ file main-cgo
 ./main-cgo: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked,
    for GNU/Linux 2.6.32, not stripped
 ```
 This is especially useful if we want to include the binary in a container.
 If the binary is statically compiled, we may use `busybox` or even `scratch` as the base image.
 This should be the preferred way of compiling binaries that strictly require C code to be a part of it.
 #### GOARM
 32-bit ARM comes in two main flavours: ARMv5 and ARMv7. Go has the `GOARM` environment variable that controls which version of ARM Go should target. Here's a table of all ARM versions and how they play together:
 ARM Version | GOARCH | GOARM | GCC package | No. of bits
 ----------- | ------ | ----- | ----------- | -----------
 ARMv5       | arm    | 5     | armel       | 32-bit
 ARMv6       | arm    | 6     | -           | 32-bit
 ARMv7       | arm    | 7     | armhf       | 32-bit
 ARMv8       | arm64  | -     | aarch64     | 64-bit
 The compability between the versions is pretty straightforward, ARMv5 binaries may run on ARMv7 hosts, but not vice versa.
 ## Cross-building docker images for linux
 After binaries have been cross-compiled, they should be distributed in some manner.
 The default and maybe the most intuitive way of doing this is by packaging it in a docker image.
 ### Trivial Dockerfile
 All `Dockerfile` commands except for `RUN` works for any architecture without any modification.
 The base image has to be switched to an arch-specific one, but except from that, a cross-built image is only a `docker build` away.
 ```Dockerfile
 FROM armel/busybox
 ENV kubernetes=true
 COPY kube-apiserver /usr/local/bin/
 CMD ["/usr/local/bin/kube-apiserver"]
 ```
 ```console
 $ file kube-apiserver
 kube-apiserver: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
 $ docker build -t gcr.io/google_containers/kube-apiserver-arm:v1.x.y .
 Step 1 : FROM armel/busybox
 ---> 9bb1e6d4f824
 Step 2 : ENV kubernetes true
 ---> Running in 8a1bfcb220ac
 ---> e4ef9f34236e
 Removing intermediate container 8a1bfcb220ac
 Step 3 : COPY kube-apiserver /usr/local/bin/
 ---> 3f0c4633e5ac
 Removing intermediate container b75a054ab53c
 Step 4 : CMD /usr/local/bin/kube-apiserver
 ---> Running in 4e6fe931a0a5
 ---> 28f50e58c909
 Removing intermediate container 4e6fe931a0a5
 Successfully built 28f50e58c909
 ```
 ### Complex Dockerfile
 However, in the most cases, `RUN` statements are needed when building the image.
 The `RUN` statement invokes `/bin/sh` inside the container, but in this example, `/bin/sh` is an ARM binary, which can't execute on an `amd64` processor.
 #### QEMU to the rescue
 Here's a way to run ARM Docker images on an amd64 host by using `qemu`:
 ```console
 # Register other architectures` magic numbers in the binfmt_misc kernel module, so it`s possible to run foreign binaries
 $ docker run --rm --privileged multiarch/qemu-user-static:register --reset
 # Download qemu 2.5.0
 $ curl -sSL https://github.com/multiarch/qemu-user-static/releases/download/v2.5.0/x86_64_qemu-arm-static.tar.xz \
    | tar -xJ
 # Run a foreign docker image, and inject the amd64 qemu binary for translating all syscalls
 $ docker run -it -v $(pwd)/qemu-arm-static:/usr/bin/qemu-arm-static armel/busybox /bin/sh
 # Now we`re inside an ARM container although we`re running on an amd64 host
 $ uname -a
 Linux 0a7da80f1665 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 armv7l GNU/Linux
 ```
 Here a linux module called `binfmt_misc` registered the "magic numbers" in the kernel, so the kernel may detect which architecture a binary is, and prepend the call with `/usr/bin/qemu-(arm|aarch64|ppc64le)-static`. For example, `/usr/bin/qemu-arm-static` is a statically linked `amd64` binary that translates all ARM syscalls to `amd64` syscalls.
 The multiarch guys have done a great job here, you may find the source for this and other images at [GitHub](https://github.com/multiarch)
 ## Implementation
 ## History
 32-bit ARM (`linux/arm`) was the first platform Kubernetes was ported to, and luxas' project [`Kubernetes on ARM`](https://github.com/luxas/kubernetes-on-arm) (released on GitHub the 31st of September 2015)
 served as a way of running Kubernetes on ARM devices easily.
 The 30th of November 2015, a tracking issue about making Kubernetes run on ARM was opened: [#17981](https://github.com/kubernetes/kubernetes/issues/17981). It later shifted focus to how to make Kubernetes a more platform-independent system.
 The 27th of April 2016, Kubernetes `v1.3.0-alpha.3` was released, and it became the first release that was able to run the [docker getting started guide](http://kubernetes.io/docs/getting-started-guides/docker/) on `linux/amd64`, `linux/arm`, `linux/arm64` and `linux/ppc64le` without any modification.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/multi-platform.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/multiple-schedulers.md
+++ b/docs/proposals/multiple-schedulers.md
@ -1,138 +1 @@
-# Multi-Scheduler in Kubernetes
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/multiple-schedulers.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/multiple-schedulers.md)
 **Status**: Design & Implementation in progress.
 > Contact @HaiyangDING for questions & suggestions.
 ## Motivation
 In current Kubernetes design, there is only one default scheduler in a Kubernetes cluster.
 However it is common that multiple types of workload, such as traditional batch, DAG batch, streaming and user-facing production services,
 are running in the same cluster and they need to be scheduled in different ways. For example, in
 [Omega](http://research.google.com/pubs/pub41684.html) batch workload and service workload are scheduled by two types of schedulers:
 the batch workload is scheduled by a scheduler which looks at the current usage of the cluster to improve the resource usage rate
 and the service workload is scheduled by another one which considers the reserved resources in the
 cluster and many other constraints since their performance must meet some higher SLOs.
 [Mesos](http://mesos.apache.org/) has done a great work to support multiple schedulers by building a
 two-level scheduling structure. This proposal describes how Kubernetes is going to support multi-scheduler
 so that users could be able to run their user-provided scheduler(s) to enable some customized scheduling
 behavior as they need. As previously discussed in [#11793](https://github.com/kubernetes/kubernetes/issues/11793),
 [#9920](https://github.com/kubernetes/kubernetes/issues/9920) and [#11470](https://github.com/kubernetes/kubernetes/issues/11470),
 the design of the multiple scheduler should be generic and includes adding a scheduler name annotation to separate the pods.
 It is worth mentioning that the proposal does not address the question of how the scheduler name annotation gets
 set although it is reasonable to anticipate that it would be set by a component like admission controller/initializer,
 as the doc currently does.
 Before going to the details of this proposal, below lists a number of the methods to extend the scheduler:
 - Write your own scheduler and run it along with Kubernetes native scheduler. This is going to be detailed in this proposal
 - Use the callout approach such as the one implemented in [#13580](https://github.com/kubernetes/kubernetes/issues/13580)
 - Recompile the scheduler with a new policy
 - Restart the scheduler with a new [scheduler policy config file](../../examples/scheduler-policy-config.json)
 - Or maybe in future dynamically link a new policy into the running scheduler
 ## Challenges in multiple schedulers
 - Separating the pods
    Each pod should be scheduled by only one scheduler. As for implementation, a pod should
    have an additional field to tell by which scheduler it wants to be scheduled. Besides,
    each scheduler, including the default one, should have a unique logic of how to add unscheduled
    pods to its to-be-scheduled pod queue. Details will be explained in later sections.
 - Dealing with conflicts
    Different schedulers are essentially separated processes. When all schedulers try to schedule
    their pods onto the nodes, there might be conflicts.
    One example of the conflicts is resource racing: Suppose there be a `pod1` scheduled by
    `my-scheduler` requiring 1 CPU's *request*, and a `pod2` scheduled by `kube-scheduler` (k8s native
    scheduler, acting as default scheduler) requiring 2 CPU's *request*, while `node-a` only has 2.5
    free CPU's, if both schedulers all try to put their pods on `node-a`, then one of them would eventually
    fail when Kubelet on `node-a` performs the create action due to insufficient CPU resources.
    This conflict is complex to deal with in api-server and etcd. Our current solution is to let Kubelet
    to do the conflict check and if the conflict happens, effected pods would be put back to scheduler
    and waiting to be scheduled again. Implementation details are in later sections.
 ## Where to start: initial design
 We definitely want the multi-scheduler design to be a generic mechanism. The following lists the changes
 we want to make in the first step.
 - Add an annotation in pod template: `scheduler.alpha.kubernetes.io/name: scheduler-name`, this is used to
 separate pods between schedulers. `scheduler-name` should match one of the schedulers' `scheduler-name`
 - Add a `scheduler-name` to each scheduler. It is done by hardcode or as command-line argument. The
 Kubernetes native scheduler (now `kube-scheduler` process) would have the name as `kube-scheduler`
 - The `scheduler-name` plays an important part in separating the pods between different schedulers.
 Pods are statically dispatched to different schedulers based on `scheduler.alpha.kubernetes.io/name: scheduler-name`
 annotation and there should not be any conflicts between different schedulers handling their pods, i.e. one pod must
 NOT be claimed by more than one scheduler. To be specific, a scheduler can add a pod to its queue if and only if:
    1. The pod has no nodeName, **AND**
    2. The `scheduler-name` specified in the pod's annotation `scheduler.alpha.kubernetes.io/name: scheduler-name`
    matches the `scheduler-name` of the scheduler.
        The only one exception is the default scheduler. Any pod that has no `scheduler.alpha.kubernetes.io/name: scheduler-name`
        annotation is assumed to be handled by the "default scheduler". In the first version of the multi-scheduler feature,
        the default scheduler would be the Kubernetes built-in scheduler with `scheduler-name` as `kube-scheduler`.
        The Kubernetes build-in scheduler will claim any pod which has no `scheduler.alpha.kubernetes.io/name: scheduler-name`
        annotation or which has `scheduler.alpha.kubernetes.io/name: kube-scheduler`. In the future, it may be possible to
        change which scheduler is the default for a given cluster.
 - Dealing with conflicts. All schedulers must use predicate functions that are at least as strict as
 the ones that Kubelet applies when deciding whether to accept a pod, otherwise Kubelet and scheduler
 may get into an infinite loop where Kubelet keeps rejecting a pod and scheduler keeps re-scheduling
 it back the same node. To make it easier for people who write new schedulers to obey this rule, we will
 create a library containing the predicates Kubelet uses. (See issue [#12744](https://github.com/kubernetes/kubernetes/issues/12744).)
 In summary, in the initial version of this multi-scheduler design, we will achieve the following:
 - If a pod has the annotation `scheduler.alpha.kubernetes.io/name: kube-scheduler` or the user does not explicitly
 sets this annotation in the template, it will be picked up by default scheduler
 - If the annotation is set and refers to a valid `scheduler-name`, it will be picked up by the scheduler of
 specified `scheduler-name`
 - If the annotation is set but refers to an invalid `scheduler-name`, the pod will not be picked by any scheduler.
 The pod will keep PENDING.
 ### An example
 ```yaml
    kind: Pod
    apiVersion: v1
    metadata:
        name: pod-abc   
        labels:
            foo: bar
        annotations:
            scheduler.alpha.kubernetes.io/name: my-scheduler
 ```
 This pod will be scheduled by "my-scheduler" and ignored by "kube-scheduler". If there is no running scheduler
 of name "my-scheduler", the pod will never be scheduled.
 ## Next steps
 1. Use admission controller to add and verify the annotation, and do some modification if necessary. For example, the
 admission controller might add the scheduler annotation based on the namespace of the pod, and/or identify if
 there are conflicting rules, and/or set a default value for the scheduler annotation, and/or reject pods on
 which the client has set a scheduler annotation that does not correspond to a running scheduler.
 2. Dynamic launching scheduler(s) and registering to admission controller (as an external call). This also
 requires some work on authorization and authentication to control what schedulers can write the /binding
 subresource of which pods.
 3. Optimize the behaviors of priority functions in multi-scheduler scenario. In the case where multiple schedulers have
 the same predicate and priority functions (for example, when using multiple schedulers for parallelism rather than to
 customize the scheduling policies), all schedulers would tend to pick the same node as "best" when scheduling identical
 pods and therefore would be likely to conflict on the Kubelet. To solve this problem, we can pass
 an optional flag such as `--randomize-node-selection=N` to scheduler, setting this flag would cause the scheduler to pick
 randomly among the top N nodes instead of the one with the highest score.
 ## Other issues/discussions related to scheduler design
 - [#13580](https://github.com/kubernetes/kubernetes/pull/13580): scheduler extension
 - [#17097](https://github.com/kubernetes/kubernetes/issues/17097): policy config file in pod template
 - [#16845](https://github.com/kubernetes/kubernetes/issues/16845): scheduling groups of pods
 - [#17208](https://github.com/kubernetes/kubernetes/issues/17208): guide to writing a new scheduler
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/multiple-schedulers.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/network-policy.md
+++ b/docs/proposals/network-policy.md
@ -1,304 +1 @@
-# NetworkPolicy
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/network-policy.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/network-policy.md)
 ## Abstract
 A proposal for implementing a new resource - NetworkPolicy - which
 will enable definition of ingress policies for selections of pods.
 The design for this proposal has been created by, and discussed
 extensively within the Kubernetes networking SIG.  It has been implemented
 and tested using Kubernetes API extensions by various networking solutions already.
 In this design, users can create various NetworkPolicy objects which select groups of pods and
 define how those pods should be allowed to communicate with each other.  The
 implementation of that policy at the network layer is left up to the
 chosen networking solution.
 > Note that this proposal does not yet include egress / cidr-based policy, which is still actively undergoing discussion in the SIG. These are expected to augment this proposal in a backwards compatible way.
 ## Implementation
 The implementation in Kubernetes consists of:
 - A v1beta1 NetworkPolicy API object
 - A structure on the `Namespace` object to control policy, to be developed as an annotation for now.
 ### Namespace changes
 The following objects will be defined on a Namespace Spec.
 >NOTE: In v1beta1 the Namespace changes will be implemented as an annotation.
 ```go
 type IngressIsolationPolicy string
 const (
 	// Deny all ingress traffic to pods in this namespace. Ingress means 
 	// any incoming traffic to pods, whether that be from other pods within this namespace
 	// or any source outside of this namespace.
 	DefaultDeny IngressIsolationPolicy = "DefaultDeny"
 ) 
 // Standard NamespaceSpec object, modified to include a new
 // NamespaceNetworkPolicy field.
 type NamespaceSpec struct {
 	// This is a pointer so that it can be left undefined.
 	NetworkPolicy *NamespaceNetworkPolicy `json:"networkPolicy,omitempty"`
 }
 type NamespaceNetworkPolicy struct {
 	// Ingress configuration for this namespace.  This config is 
 	// applied to all pods within this namespace. For now, only 
 	// ingress is supported.  This field is optional - if not 
 	// defined, then the cluster default for ingress is applied.
 	Ingress *NamespaceIngressPolicy `json:"ingress,omitempty"`
 }
 // Configuration for ingress to pods within this namespace.
 // For now, this only supports specifying an isolation policy.
 type NamespaceIngressPolicy struct {
 	// The isolation policy to apply to pods in this namespace.
 	// Currently this field only supports "DefaultDeny", but could 
 	// be extended to support other policies in the future.  When set to DefaultDeny,
 	// pods in this namespace are denied ingress traffic by default.  When not defined,
 	// the cluster default ingress isolation policy is applied (currently allow all). 
 	Isolation *IngressIsolationPolicy `json:"isolation,omitempty"` 
 }
 ```
 ```yaml
 kind: Namespace
 apiVersion: v1
 spec:
  networkPolicy:
    ingress:
      isolation: DefaultDeny
 ```
 The above structures will be represented in v1beta1 as a json encoded annotation like so:
 ```yaml
 kind: Namespace
 apiVersion: v1
 metadata:
  annotations:
    net.beta.kubernetes.io/network-policy: |
      {
        "ingress": {
          "isolation": "DefaultDeny"
        }
      }
 ```
 ### NetworkPolicy Go Definition
 For a namespace with ingress isolation, connections to pods in that namespace (from any source) are prevented.
 The user needs a way to explicitly declare which connections are allowed into pods of that namespace.
 This is accomplished through ingress rules on `NetworkPolicy`
 objects (of which there can be multiple in a single namespace).  Pods selected by
 one or more NetworkPolicy objects should allow any incoming connections that match any
 ingress rule on those NetworkPolicy objects, per the network plugin’s capabilities.
 NetworkPolicy objects and the above namespace isolation both act on _connections_ rather than individual packets.  That is to say that if traffic from pod A to pod B is allowed by the configured
 policy, then the return packets for that connection from B -> A are also allowed, even if the policy in place would not allow B to initiate a connection to A.  NetworkPolicy objects act on a broad definition of _connection_ which includes both TCP and UDP streams.   If new network policy is applied that would block an existing connection between two endpoints, the enforcer of policy
 should terminate and block the existing connection as soon as can be expected by the implementation.
 We propose adding the new NetworkPolicy object to the `extensions/v1beta1` API group for now.
 The SIG also considered the following while developing the proposed NetworkPolicy object:
 - A per-pod policy field.  We discounted this in favor of the loose coupling that labels provide, similar to Services.
 - Per-Service policy.  We chose not to attach network policy to services to avoid semantic overloading of a single object, and conflating the existing semantics of load-balancing and service discovery with those of network policy.
 ```go
 type NetworkPolicy struct {
 	TypeMeta
 	ObjectMeta
 	// Specification of the desired behavior for this NetworkPolicy.
 	Spec NetworkPolicySpec 
 }
 type NetworkPolicySpec struct {
 	// Selects the pods to which this NetworkPolicy object applies.  The array of ingress rules 
 	// is applied to any pods selected by this field. Multiple network policies can select the 
 	// same set of pods.  In this case, the ingress rules for each are combined additively.
 	// This field is NOT optional and follows standard unversioned.LabelSelector semantics.  
 	// An empty podSelector matches all pods in this namespace.
 	PodSelector unversioned.LabelSelector `json:"podSelector"`
 	// List of ingress rules to be applied to the selected pods.
 	// Traffic is allowed to a pod if namespace.networkPolicy.ingress.isolation is undefined and cluster policy allows it, 
 	// OR if the traffic source is the pod's local node, 
 	// OR if the traffic matches at least one ingress rule across all of the NetworkPolicy 
 	// objects whose podSelector matches the pod.  
 	// If this field is empty then this NetworkPolicy does not affect ingress isolation.
 	// If this field is present and contains at least one rule, this policy allows any traffic
 	// which matches at least one of the ingress rules in this list.
 	Ingress []NetworkPolicyIngressRule `json:"ingress,omitempty"`
 }
 // This NetworkPolicyIngressRule matches traffic if and only if the traffic matches both ports AND from. 
 type NetworkPolicyIngressRule struct {
 	// List of ports which should be made accessible on the pods selected for this rule. 
 	// Each item in this list is combined using a logical OR.  
 	// If this field is not provided, this rule matches all ports (traffic not restricted by port). 
 	// If this field is empty, this rule matches no ports (no traffic matches).
 	// If this field is present and contains at least one item, then this rule allows traffic 
 	// only if the traffic matches at least one port in the ports list. 
 	Ports *[]NetworkPolicyPort `json:"ports,omitempty"`
 	// List of sources which should be able to access the pods selected for this rule.
 	// Items in this list are combined using a logical OR operation.
 	// If this field is not provided, this rule matches all sources (traffic not restricted by source).
 	// If this field is empty, this rule matches no sources (no traffic matches). 
 	// If this field is present and contains at least on item, this rule allows traffic only if the 
 	// traffic matches at least one item in the from list. 
 	From *[]NetworkPolicyPeer `json:"from,omitempty"`
 }
 type NetworkPolicyPort struct {
 	// Optional.  The protocol (TCP or UDP) which traffic must match.
 	// If not specified, this field defaults to TCP. 
 	Protocol *api.Protocol `json:"protocol,omitempty"`
 	// If specified, the port on the given protocol.  This can 
 	// either be a numerical or named port.  If this field is not provided,
 	// this matches all port names and numbers.
 	// If present, only traffic on the specified protocol AND port
 	// will be matched.
 	Port *intstr.IntOrString `json:"port,omitempty"`
 }
 type NetworkPolicyPeer struct {
 	// Exactly one of the following must be specified.
 	// This is a label selector which selects Pods in this namespace.
 	// This field follows standard unversioned.LabelSelector semantics.
 	// If present but empty, this selector selects all pods in this namespace.
 	PodSelector *unversioned.LabelSelector `json:"podSelector,omitempty"`
 	// Selects Namespaces using cluster scoped-labels.  This 
 	// matches all pods in all namespaces selected by this label selector. 
 	// This field follows standard unversioned.LabelSelector semantics.
 	// If present but empty, this selector selects all namespaces.
 	NamespaceSelector *unversioned.LabelSelector `json:"namespaceSelector,omitempty"`
 }
 ```
 ### Behavior
 The following pseudo-code attempts to define when traffic is allowed to a given pod when using this API.
 ```python
 def is_traffic_allowed(traffic, pod):
  """
  Returns True if traffic is allowed to this pod, False otherwise.
  """
  if not pod.Namespace.Spec.NetworkPolicy.Ingress.Isolation:
    # If ingress isolation is disabled on the Namespace, use cluster default.
    return clusterDefault(traffic, pod)
  elif traffic.source == pod.node.kubelet:
    # Traffic is from kubelet health checks.
    return True
  else:
    # If namespace ingress isolation is enabled, only allow traffic 
    # that matches a network policy which selects this pod.
    for network_policy in network_policies(pod.Namespace):
      if not network_policy.Spec.PodSelector.selects(pod):
        # This policy doesn't select this pod. Try the next one. 
        continue
      # This policy selects this pod.  Check each ingress rule 
      # defined on this policy to see if it allows the traffic.
      # If at least one does, then the traffic is allowed.
      for ingress_rule in network_policy.Ingress or []:
        if ingress_rule.matches(traffic): 
          return True 
  # Ingress isolation is DefaultDeny and no policies match the given pod and traffic.
  return false 
 ```
 ### Potential Future Work / Questions
 - A single podSelector per NetworkPolicy may lead to managing a large number of NetworkPolicy objects, each of which is small and easy to understand on its own. However, this may lead for a policy change to require touching several policy objects. Allowing an optional podSelector per ingress rule additionally to the podSelector per NetworkPolicy object would allow the user to group rules into logical segments and define size/complexity ratio where it makes sense. This may lead to a smaller number of objects with more complexity if the user opts in to the additional podSelector.  This increases the complexity of the NetworkPolicy object itself. This proposal has opted to favor a larger number of smaller objects that are easier to understand, with the understanding that additional podSelectors could be added to this design in the future should the requirement become apparent.
 - Is the `Namespaces` selector in the `NetworkPolicyPeer` struct too coarse? Do we need to support the AND combination of `Namespaces` and `Pods`?
 ### Examples
 1) Only allow traffic from frontend pods on TCP port 6379 to backend pods in the same namespace.
 ```yaml
 kind: Namespace
 apiVersion: v1
 metadata:
  name: myns
  annotations:
    net.beta.kubernetes.io/network-policy: |
      {
        "ingress": {
          "isolation": "DefaultDeny"
        }
      }
 ---
 kind: NetworkPolicy
 apiVersion: extensions/v1beta1 
 metadata:
  name: allow-frontend
  namespace: myns
 spec:
  podSelector:            
    matchLabels:
      role: backend
  ingress:                
    - from:              
        - podSelector:
            matchLabels:
              role: frontend
      ports:
        - protocol: TCP
          port: 6379
 ```
 2) Allow TCP 443 from any source in Bob's namespaces.
 ```yaml
 kind: NetworkPolicy
 apiVersion: extensions/v1beta1 
 metadata:
  name: allow-tcp-443
 spec:
  podSelector:            
    matchLabels:
      role: frontend 
  ingress:
    - ports:
        - protocol: TCP
          port: 443 
      from:
        - namespaceSelector:
            matchLabels:
              user: bob 
 ```
 3) Allow all traffic to all pods in this namespace.
 ```yaml
 kind: NetworkPolicy
 apiVersion: extensions/v1beta1
 metadata:
  name: allow-all
 spec:
  podSelector:
  ingress:
  - {}
 ```
 ## References
 - https://github.com/kubernetes/kubernetes/issues/22469 tracks network policy in kubernetes.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/network-policy.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/node-allocatable.md
+++ b/docs/proposals/node-allocatable.md
@ -1,151 +1 @@
-# Node Allocatable Resources
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md)
 **Issue:** https://github.com/kubernetes/kubernetes/issues/13984
 ## Overview
 Currently Node.Status has Capacity, but no concept of node Allocatable. We need additional
 parameters to serve several purposes:
 1. Kubernetes metrics provides "/docker-daemon", "/kubelet",
   "/kube-proxy", "/system" etc. raw containers for monitoring system component resource usage
   patterns and detecting regressions. Eventually we want to cap system component usage to a certain
   limit / request. However this is not currently feasible due to a variety of reasons including:
       1. Docker still uses tons of computing resources (See
          [#16943](https://github.com/kubernetes/kubernetes/issues/16943))
       2. We have not yet defined the minimal system requirements, so we cannot control Kubernetes
          nodes or know about arbitrary daemons, which can make the system resources
          unmanageable. Even with a resource cap we cannot do a full resource management on the
          node, but with the proposed parameters we can mitigate really bad resource over commits
       3. Usage scales with the number of pods running on the node
 2. For external schedulers (such as mesos, hadoop, etc.) integration, they might want to partition
   compute resources on a given node, limiting how much Kubelet can use. We should provide a
   mechanism by which they can query kubelet, and reserve some resources for their own purpose.
 ### Scope of proposal
 This proposal deals with resource reporting through the [`Allocatable` field](#allocatable) for more
 reliable scheduling, and minimizing resource over commitment. This proposal *does not* cover
 resource usage enforcement (e.g. limiting kubernetes component usage), pod eviction (e.g. when
 reservation grows), or running multiple Kubelets on a single node.
 ## Design
 ### Definitions
 ![image](node-allocatable.png)
 1. **Node Capacity** - Already provided as
   [`NodeStatus.Capacity`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus),
   this is total capacity read from the node instance, and assumed to be constant.
 2. **System-Reserved** (proposed) - Compute resources reserved for processes which are not managed by
   Kubernetes. Currently this covers all the processes lumped together in the `/system` raw
   container.
 3. **Kubelet Allocatable** - Compute resources available for scheduling (including scheduled &
   unscheduled resources). This value is the focus of this proposal. See [below](#api-changes) for
   more details.
 4. **Kube-Reserved** (proposed) - Compute resources reserved for Kubernetes components such as the
   docker daemon, kubelet, kube proxy, etc.
 ### API changes
 #### Allocatable
 Add `Allocatable` (4) to
 [`NodeStatus`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus):
 ```
 type NodeStatus struct {
  ...
  // Allocatable represents schedulable resources of a node.
  Allocatable ResourceList `json:"allocatable,omitempty"`
  ...
 }
 ```
 Allocatable will be computed by the Kubelet and reported to the API server. It is defined to be:
 ```
   [Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved]
 ```
 The scheduler will use `Allocatable` in place of `Capacity` when scheduling pods, and the Kubelet
 will use it when performing admission checks.
 *Note: Since kernel usage can fluctuate and is out of kubernetes control, it will be reported as a
 separate value (probably via the metrics API). Reporting kernel usage is out-of-scope for this
 proposal.*
 #### Kube-Reserved
 `KubeReserved` is the parameter specifying resources reserved for kubernetes components (4). It is
 provided as a command-line flag to the Kubelet at startup, and therefore cannot be changed during
 normal Kubelet operation (this may change in the [future](#future-work)).
 The flag will be specified as a serialized `ResourceList`, with resources defined by the API
 `ResourceName` and values specified in `resource.Quantity` format, e.g.:
 ```
 --kube-reserved=cpu=500m,memory=5Mi
 ```
 Initially we will only support CPU and memory, but will eventually support more resources. See
 [#16889](https://github.com/kubernetes/kubernetes/pull/16889) for disk accounting.
 If KubeReserved is not set it defaults to a sane value (TBD) calculated from machine capacity. If it
 is explicitly set to 0 (along with `SystemReserved`), then `Allocatable == Capacity`, and the system
 behavior is equivalent to the 1.1 behavior with scheduling based on Capacity.
 #### System-Reserved
 In the initial implementation, `SystemReserved` will be functionally equivalent to
 [`KubeReserved`](#system-reserved), but with a different semantic meaning. While KubeReserved
 designates resources set aside for kubernetes components, SystemReserved designates resources set
 aside for non-kubernetes components (currently this is reported as all the processes lumped
 together in the `/system` raw container).
 ## Issues
 ### Kubernetes reservation is smaller than kubernetes component usage
 **Solution**: Initially, do nothing (best effort). Let the kubernetes daemons overflow the reserved
 resources and hope for the best. If the node usage is less than Allocatable, there will be some room
 for overflow and the node should continue to function. If the node has been scheduled to capacity
 (worst-case scenario) it may enter an unstable state, which is the current behavior in this
 situation.
 In the [future](#future-work) we may set a parent cgroup for kubernetes components, with limits set
 according to `KubeReserved`.
 ### Version discrepancy
 **API server / scheduler is not allocatable-resources aware:** If the Kubelet rejects a Pod but the
  scheduler expects the Kubelet to accept it, the system could get stuck in an infinite loop
  scheduling a Pod onto the node only to have Kubelet repeatedly reject it. To avoid this situation,
  we will do a 2-stage rollout of `Allocatable`. In stage 1 (targeted for 1.2), `Allocatable` will
  be reported by the Kubelet and the scheduler will be updated to use it, but Kubelet will continue
  to do admission checks based on `Capacity` (same as today). In stage 2 of the rollout (targeted
  for 1.3 or later), the Kubelet will start doing admission checks based on `Allocatable`.
 **API server expects `Allocatable` but does not receive it:** If the kubelet is older and does not
  provide `Allocatable` in the `NodeStatus`, then `Allocatable` will be
  [defaulted](../../pkg/api/v1/defaults.go) to
  `Capacity` (which will yield today's behavior of scheduling based on capacity).
 ### 3rd party schedulers
 The community should be notified that an update to schedulers is recommended, but if a scheduler is
 not updated it falls under the above case of "scheduler is not allocatable-resources aware".
 ## Future work
 1. Convert kubelet flags to Config API - Prerequisite to (2). See
   [#12245](https://github.com/kubernetes/kubernetes/issues/12245).
 2. Set cgroup limits according KubeReserved - as described in the [overview](#overview)
 3. Report kernel usage to be considered with scheduling decisions.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/node-allocatable.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/performance-related-monitoring.md
+++ b/docs/proposals/performance-related-monitoring.md
@ -1,116 +1 @@
-# Performance Monitoring
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/performance-related-monitoring.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/performance-related-monitoring.md)
 ## Reason for this document
 This document serves as a place to gather information about past performance regressions, their reason and impact and discuss ideas to avoid similar regressions in the future.
 Main reason behind doing this is to understand what kind of monitoring needs to be in place to keep Kubernetes fast.
 ## Known past and present performance issues
 ### Higher logging level causing scheduler stair stepping
 Issue https://github.com/kubernetes/kubernetes/issues/14216 was opened because @spiffxp observed a regression in scheduler performance in 1.1 branch in comparison to `old` 1.0
 cut. In the end it turned out the be caused by `--v=4` (instead of default `--v=2`) flag in the scheduler together with the flag `--logtostderr` which disables batching of
 log lines and a number of logging without explicit V level. This caused weird behavior of the whole component.
 Because we now know that logging may have big performance impact we should consider instrumenting logging mechanism and compute statistics such as number of logged messages,
 total and average size of them. Each binary should be responsible for exposing its metrics. An unaccounted but way too big number of days, if not weeks, of engineering time was
 lost because of this issue.
 ### Adding per-pod probe-time, which increased the number of PodStatus updates, causing major slowdown
 In September 2015 we tried to add per-pod probe times to the PodStatus. It caused (https://github.com/kubernetes/kubernetes/issues/14273) a massive increase in both number and
 total volume of object (PodStatus) changes. It drastically increased the load on API server which wasn’t able to handle new number of requests quickly enough, violating our
 response time SLO. We had to revert this change.
 ### Late Ready->Running PodPhase transition caused test failures as it seemed like slowdown
 In late September we encountered a strange problem (https://github.com/kubernetes/kubernetes/issues/14554): we observed an increased observed latencies in small clusters (few
 Nodes). It turned out that it’s caused by an added latency between PodRunning and PodReady phases. This was not a real regression, but our tests thought it were, which shows
 how careful we need to be.
 ### Huge number of handshakes slows down API server
 It was a long standing issue for performance and is/was an important bottleneck for scalability (https://github.com/kubernetes/kubernetes/issues/13671). The bug directly
 causing this problem was incorrect (from the golangs standpoint) handling of TCP connections. Secondary issue was that elliptic curve encryption (only one available in go 1.4)
 is unbelievably slow.
 ## Proposed metrics/statistics to gather/compute to avoid problems
 ### Cluster-level metrics
 Basic ideas:
 - number of Pods/ReplicationControllers/Services in the cluster
 - number of running replicas of master components (if they are replicated)
 - current elected master of ectd cluster (if running distributed version)
 - nuber of master component restarts
 - number of lost Nodes
 ### Logging monitoring
 Log spam is a serious problem and we need to keep it under control. Simplest way to check for regressions, suggested by @brendandburns, is to compute the rate in which log files
 grow in e2e tests.
 Basic ideas:
 - log generation rate (B/s)
 ### REST call monitoring
 We do measure REST call duration in the Density test, but we need an API server monitoring as well, to avoid false failures caused e.g. by the network traffic. We already have
 some metrics in place (https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/metrics/metrics.go), but we need to revisit the list and add some more.
 Basic ideas:
 - number of calls per verb, client, resource type
 - latency distribution per verb, client, resource type
 - number of calls that was rejected per client, resource type and reason (invalid version number, already at maximum number of requests in flight)
 - number of relists in various watchers
 ### Rate limit monitoring
 Reverse of REST call monitoring done in the API server. We need to know when a given component increases a pressure it puts on the API server. As a proxy for number of
 requests sent we can track how saturated are rate limiters. This has additional advantage of giving us data needed to fine-tune rate limiter constants.
 Because we have rate limiting on both ends (client and API server) we should monitor number of inflight requests in API server and how it relates to `max-requests-inflight`.
 Basic ideas:
 - percentage of used non-burst limit,
 - amount of time in last hour with depleted burst tokens,
 - number of inflight requests in API server.
 ### Network connection monitoring
 During development we observed incorrect use/reuse of HTTP connections multiple times already. We should at least monitor number of created connections.
 ### ETCD monitoring
@xiang-90 and @hongchaodeng - you probably have way more experience on what'd be good to look at from the ETCD perspective.
 Basic ideas:
 - ETCD memory footprint
 - number of objects per kind
 - read/write latencies per kind
 - number of requests from the API server
 - read/write counts per key (it may be too heavy though)
 ### Resource consumption
 On top of all things mentioned above we need to monitor changes in resource usage in both: cluster components (API server, Kubelet, Scheduler, etc.) and system add-ons
 (Heapster, L7 load balancer, etc.). Monitoring memory usage is tricky, because if no limits are set, system won't apply memory pressure to processes, which makes their memory
 footprint constantly grow. We argue that monitoring usage in tests still makes sense, as tests should be repeatable, and if memory usage will grow drastically between two runs
 it most likely can be attributed to some kind of regression (assuming that nothing else has changed in the environment).
 Basic ideas:
 - CPU usage
 - memory usage
 ### Other saturation metrics
 We should monitor other aspects of the system, which may indicate saturation of some component.
 Basic ideas:
 - queue length for queues in the system,
 - wait time for WaitGroups.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/performance-related-monitoring.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/pod-lifecycle-event-generator.md
+++ b/docs/proposals/pod-lifecycle-event-generator.md
@ -1,201 +1 @@
-# Kubelet: Pod Lifecycle Event Generator (PLEG)
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-lifecycle-event-generator.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-lifecycle-event-generator.md)
 In Kubernetes, Kubelet is a per-node daemon that manages the pods on the node,
 driving the pod states to match their pod specifications (specs). To achieve
 this, Kubelet needs to react to changes in both (1) pod specs and (2) the
 container states. For the former, Kubelet watches the pod specs changes from
 multiple sources; for the latter, Kubelet polls the container runtime
 periodically (e.g., 10s) for the latest states for all containers.
 Polling incurs non-negligible overhead as the number of pods/containers increases,
 and is exacerbated by Kubelet's parallelism -- one worker (goroutine) per pod, which
 queries the container runtime individually. Periodic, concurrent, large number
 of requests causes high CPU usage spikes (even when there is no spec/state
 change), poor performance, and reliability problems due to overwhelmed container
 runtime. Ultimately, it limits Kubelet's scalability.
 (Related issues reported by users: [#10451](https://issues.k8s.io/10451),
 [#12099](https://issues.k8s.io/12099), [#12082](https://issues.k8s.io/12082))
 ## Goals and Requirements
 The goal of this proposal is to improve Kubelet's scalability and performance
 by lowering the pod management overhead.
 - Reduce unnecessary work during inactivity (no spec/state changes)
 - Lower the concurrent requests to the container runtime.
 The design should be generic so that it can support different container runtimes
 (e.g., Docker and rkt).
 ## Overview
 This proposal aims to replace the periodic polling with a pod lifecycle event
 watcher.
 ![pleg](pleg.png)
 ## Pod Lifecycle Event
 A pod lifecycle event interprets the underlying container state change at the
 pod-level abstraction, making it container-runtime-agnostic. The abstraction
 shields Kubelet from the runtime specifics.
 ```go
 type PodLifeCycleEventType string
 const (
    ContainerStarted      PodLifeCycleEventType = "ContainerStarted"
    ContainerStopped      PodLifeCycleEventType = "ContainerStopped"
    NetworkSetupCompleted PodLifeCycleEventType = "NetworkSetupCompleted"
    NetworkFailed         PodLifeCycleEventType = "NetworkFailed"
 )
 // PodLifecycleEvent is an event reflects the change of the pod state.
 type PodLifecycleEvent struct {
    // The pod ID.
    ID types.UID
    // The type of the event.
    Type PodLifeCycleEventType
    // The accompanied data which varies based on the event type.
    Data interface{}
 }
 ```
 Using Docker as an example, starting of a POD infra container would be
 translated to a NetworkSetupCompleted`pod lifecycle event.
 ## Detect Changes in Container States Via Relisting
 In order to generate pod lifecycle events, PLEG needs to detect changes in
 container states. We can achieve this by periodically relisting all containers
 (e.g., docker ps). Although this is similar to Kubelet's polling today, it will
 only be performed by a single thread (PLEG).  This means that we still
 benefit from not having all pod workers hitting the container runtime
 concurrently. Moreover, only the relevant pod worker would be woken up
 to perform a sync.
 The upside of relying on relisting is that it is container runtime-agnostic,
 and requires no external dependency.
 ### Relist period
 The shorter the relist period is, the sooner that Kubelet can detect the
 change. Shorter relist period also implies higher cpu usage. Moreover, the
 relist latency depends on the underlying container runtime, and usually
 increases as the number of containers/pods grows. We should set a default
 relist period based on measurements. Regardless of what period we set, it will
 likely be significantly shorter than the current pod sync period (10s), i.e.,
 Kubelet will detect container changes sooner.
 ## Impact on the Pod Worker Control Flow
 Kubelet is responsible for dispatching an event to the appropriate pod
 worker based on the pod ID. Only one pod worker would be woken up for
 each event.
 Today, the pod syncing routine in Kubelet is idempotent as it always
 examines the pod state and the spec, and tries to drive to state to
 match the spec by performing a series of operations. It should be
 noted that this proposal does not intend to change this property --
 the sync pod routine would still perform all necessary checks,
 regardless of the event type. This trades some efficiency for
 reliability and eliminate the need to build a state machine that is
 compatible with different runtimes.
 ## Leverage Upstream Container Events
 Instead of relying on relisting, PLEG can leverage other components which
 provide container events, and translate these events into pod lifecycle
 events. This will further improve Kubelet's responsiveness and reduce the
 resource usage caused by frequent relisting.
 The upstream container events can come from:
 (1). *Event stream provided by each container runtime*
 Docker's API exposes an [event
 stream](https://docs.docker.com/reference/api/docker_remote_api_v1.17/#monitor-docker-s-events).
 Nonetheless, rkt does not support this yet, but they will eventually support it
 (see [coreos/rkt#1193](https://github.com/coreos/rkt/issues/1193)).
 (2). *cgroups event stream by cAdvisor*
 cAdvisor is integrated in Kubelet to provide container stats. It watches cgroups
 containers using inotify and exposes an event stream. Even though it does not
 support rkt yet, it should be straightforward to add such a support.
 Option (1) may provide richer sets of events, but option (2) has the advantage
 to be more universal across runtimes, as long as the container runtime uses
 cgroups. Regardless of what one chooses to implement now, the container event
 stream should be easily swappable with a clearly defined interface.
 Note that we cannot solely rely on the upstream container events due to the
 possibility of missing events. PLEG should relist infrequently to ensure no
 events are missed.
 ## Generate Expected Events
 *This is optional for PLEGs which performs only relisting, but required for
 PLEGs that watch upstream events.*
 A pod worker's actions could lead to pod lifecycle events (e.g.,
 create/kill a container), which the worker would not observe until
 later. The pod worker should ignore such events to avoid unnecessary
 work.
 For example, assume a pod has two containers, A and B. The worker
 - Creates container A
 - Receives an event `(ContainerStopped, B)`
 - Receives an event `(ContainerStarted, A)`
 The worker should ignore the `(ContainerStarted, A)` event since it is
 expected. Arguably, the worker could process `(ContainerStopped, B)`
 as soon as it receives the event, before observing the creation of
 A. However, it is desirable to wait until the expected event
 `(ContainerStarted, A)` is observed to keep a consistent per-pod view
 at the worker. Therefore, the control flow of a single pod worker
 should adhere to the following rules:
 1. Pod worker should process the events sequentially.
 2. Pod worker should not start syncing until it observes the outcome of its own
   actions in the last sync to maintain a consistent view.
 In other words, a pod worker should record the expected events, and
 only wake up to perform the next sync until all expectations are met.
 - Creates container A, records an expected event `(ContainerStarted, A)`
 - Receives `(ContainerStopped, B)`; stores the event and goes back to sleep.
 - Receives `(ContainerStarted, A)`; clears the expectation. Proceeds to handle
   `(ContainerStopped, B)`.
 We should set an expiration time for each expected events to prevent the worker
 from being stalled indefinitely by missing events.
 ## TODOs for v1.2
 For v1.2, we will add a generic PLEG which relists periodically, and leave
 adopting container events for future work. We will also *not* implement the
 optimization that generate and filters out expected events to minimize
 redundant syncs.
 - Add a generic PLEG using relisting. Modify the container runtime interface
  to provide all necessary information to detect container state changes
  in `GetPods()` (#13571).
 - Benchmark docker to adjust relising frequency.
 - Fix/adapt features that rely on frequent, periodic pod syncing.
    * Liveness/Readiness probing: Create a separate probing manager using
      explicitly container probing period [#10878](https://issues.k8s.io/10878).
    * Instruct pod workers to set up a wake-up call if syncing failed, so that
      it can retry.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-lifecycle-event-generator.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/pod-resource-management.md
+++ b/docs/proposals/pod-resource-management.md
@ -1,416 +1 @@
-# Pod level resource management in Kubelet
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-resource-management.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-resource-management.md)
 **Author**: Buddha Prakash (@dubstack), Vishnu Kannan (@vishh)
 **Last Updated**: 06/23/2016
 **Status**: Draft Proposal (WIP)
 This document proposes a design for introducing pod level resource accounting to Kubernetes, and outlines the implementation and rollout plan.
 <!-- BEGIN MUNGE: GENERATED_TOC -->
 - [Pod level resource management in Kubelet](#pod-level-resource-management-in-kubelet)
  - [Introduction](#introduction)
  - [Non Goals](#non-goals)
  - [Motivations](#motivations)
  - [Design](#design)
    - [Proposed cgroup hierarchy:](#proposed-cgroup-hierarchy)
      - [QoS classes](#qos-classes)
      - [Guaranteed](#guaranteed)
      - [Burstable](#burstable)
      - [Best Effort](#best-effort)
    - [With Systemd](#with-systemd)
    - [Hierarchy Outline](#hierarchy-outline)
      - [QoS Policy Design Decisions](#qos-policy-design-decisions)
  - [Implementation Plan](#implementation-plan)
      - [Top level Cgroups for QoS tiers](#top-level-cgroups-for-qos-tiers)
      - [Pod level Cgroup creation and deletion (Docker runtime)](#pod-level-cgroup-creation-and-deletion-docker-runtime)
      - [Container level cgroups](#container-level-cgroups)
      - [Rkt runtime](#rkt-runtime)
      - [Add Pod level metrics to Kubelet's metrics provider](#add-pod-level-metrics-to-kubelets-metrics-provider)
  - [Rollout Plan](#rollout-plan)
  - [Implementation Status](#implementation-status)
 <!-- END MUNGE: GENERATED_TOC -->
 ## Introduction
 As of now [Quality of Service(QoS)](../../docs/design/resource-qos.md) is not enforced at a pod level. Excepting pod evictions, all the other QoS features are not applicable at the pod level.
 To better support QoS, there is a need to add support for pod level resource accounting in Kubernetes.
 We propose to have a unified cgroup hierarchy with pod level cgroups for better resource management. We will have a cgroup hierarchy with top level cgroups for the three QoS classes Guaranteed, Burstable and BestEffort. Pods (and their containers) belonging to a QoS class will be grouped under these top level QoS cgroups. And all containers in a pod are nested under the pod cgroup.
 The proposed cgroup hierarchy would allow for more efficient resource management and lead to improvements in node reliability.
 This would also allow for significant latency optimizations in terms of pod eviction on nodes with the use of pod level resource usage metrics.
 This document provides a basic outline of how we plan to implement and rollout this feature.
 ## Non Goals
 - Pod level disk accounting will not be tackled in this proposal.
 - Pod level resource specification in the Kubernetes API will not be tackled in this proposal.
 ## Motivations
 Kubernetes currently supports container level isolation only and lets users specify resource requests/limits on the containers [Compute Resources](../../docs/design/resources.md). The `kubelet` creates a cgroup sandbox (via it's container runtime) for each container.
 There are a few shortcomings to the current model.
 - Existing QoS support does not apply to pods as a whole. On-going work to support pod level eviction using QoS requires all containers in a pod to belong to the same class. By having pod level cgroups, it is easy to track pod level usage and make eviction decisions.
 - Infrastructure overhead per pod is currently charged to the node. The overhead of setting up and managing the pod sandbox is currently accounted to the node. If the pod sandbox is a bit expensive, like in the case of hyper, having pod level accounting becomes critical.
 - For the docker runtime we have a containerd-shim which is a small library that sits in front of a runtime implementation allowing it to be reparented to init, handle reattach from the caller etc. With pod level cgroups containerd-shim can be charged to the pod instead of the machine.
 - If a container exits, all its anonymous pages (tmpfs) gets accounted to the machine (root). With pod level cgroups, that usage can also be attributed to the pod.
 - Let containers share resources - with pod level limits, a pod with a Burstable container and a BestEffort container is classified as Burstable pod. The BestEffort container is able to consume slack resources not used by the Burstable container, and still be capped by the overall pod level limits.
 ## Design
 High level requirements for the design are as follows:
 - Do not break existing users. Ideally, there should be no changes to the Kubernetes API semantics.
 - Support multiple cgroup managers - systemd, cgroupfs, etc.
 How we intend to achieve these high level goals is covered in greater detail in the Implementation Plan.
 We use the following denotations in the sections below:
 For the three QoS classes
 `G⇒ Guaranteed QoS, Bu⇒ Burstable QoS, BE⇒ BestEffort QoS`
 For the value specified for the --qos-memory-overcommitment flag
 `qmo⇒ qos-memory-overcommitment`
 Currently the Kubelet highly prioritizes resource utilization and thus allows BE pods to use as much resources as they want. And in case of OOM the BE pods are first to be killed. We follow this policy as G pods often don't use the amount of resource request they specify. By overcommiting the node the BE pods are able to utilize these left over resources. And in case of OOM the BE pods are evicted by the eviciton manager. But there is some latency involved in the pod eviction process which can be a cause of concern in latency-sensitive servers. On such servers we would want to avoid OOM conditions on the node. Pod level cgroups allow us to restrict the amount of available resources to the BE pods. So reserving the requested resources for the G and Bu pods would allow us to avoid invoking the OOM killer.
 We add a flag `qos-memory-overcommitment` to kubelet which would allow users to configure the percentage of memory overcommitment on the node. We have the default as 100, so by default we allow complete overcommitment on the node and let the BE pod use as much memory as it wants, and not reserve any resources for the G and Bu pods. As expected if there is an OOM in such a case we first kill the BE pods before the G and Bu pods.
 On the other hand if a user wants to ensure very predictable tail latency for latency-sensitive servers he would need to set qos-memory-overcommitment to a really low value(preferrably 0). In this case memory resources would be reserved for the G and Bu pods and BE pods would be able to use only the left over memory resource.
 Examples in the next section.
 ### Proposed cgroup hierarchy:
 For the initial implementation we will only support limits for cpu and memory resources.
 #### QoS classes
 A pod can belong to one of the following 3 QoS classes: Guaranteed, Burstable, and BestEffort, in decreasing order of priority.
 #### Guaranteed
 `G` pods will be placed at the `$Root` cgroup by default. `$Root` is the system root i.e. "/" by default and if `--cgroup-root` flag is used then we use the specified cgroup-root as the `$Root`. To ensure Kubelet's idempotent behaviour we follow a pod cgroup naming format which is opaque and deterministic. Say we have a pod with UID: `5f9b19c9-3a30-11e6-8eea-28d2444e470d` the pod cgroup PodUID would be named: `pod-5f9b19c93a3011e6-8eea28d2444e470d`.
 __Note__: The cgroup-root flag would allow the user to configure the root of the QoS cgroup hierarchy. Hence cgroup-root would be redefined as the root of QoS cgroup hierarchy and not containers.
 ```
 /PodUID/cpu.quota = cpu limit of Pod  
 /PodUID/cpu.shares = cpu request of Pod  
 /PodUID/memory.limit_in_bytes = memory limit of Pod
 ```
 Example:
 We have two pods Pod1 and Pod2 having Pod Spec given below
 ```yaml
 kind: Pod
 metadata:
    name: Pod1
 spec:
    containers:
        name: foo
            resources:
                limits:
                    cpu: 10m
                    memory: 1Gi
        name: bar
            resources:
                limits:
                    cpu: 100m
                    memory: 2Gi
 ```
 ```yaml
 kind: Pod
 metadata:
    name: Pod2
 spec:
    containers:
        name: foo
            resources:
                limits:
                    cpu: 20m
                    memory: 2Gii
 ```
 Pod1 and Pod2 are both classified as `G` and are nested under the `Root` cgroup.
 ```
 /Pod1/cpu.quota = 110m  
 /Pod1/cpu.shares = 110m  
 /Pod2/cpu.quota = 20m  
 /Pod2/cpu.shares = 20m  
 /Pod1/memory.limit_in_bytes = 3Gi  
 /Pod2/memory.limit_in_bytes = 2Gi
 ```
 #### Burstable
 We have the following resource parameters for the `Bu` cgroup.
 ```
 /Bu/cpu.shares = summation of cpu requests of all Bu pods  
 /Bu/PodUID/cpu.quota = Pod Cpu Limit  
 /Bu/PodUID/cpu.shares = Pod Cpu Request   
 /Bu/memory.limit_in_bytes = Allocatable - {(summation of memory requests/limits of `G` pods)*(1-qom/100)}
 /Bu/PodUID/memory.limit_in_bytes = Pod memory limit
 ```
 `Note: For the `Bu` QoS when limits are not specified for any one of the containers, the Pod limit defaults to the node resource allocatable quantity.`
 Example:
 We have two pods Pod3 and Pod4 having Pod Spec given below:
 ```yaml
 kind: Pod
 metadata:
    name: Pod3
 spec:
    containers:
        name: foo
            resources:
                limits:
                    cpu: 50m
                    memory: 2Gi
                requests:
                    cpu: 20m
                    memory: 1Gi
        name: bar
            resources:
                limits:
                    cpu: 100m
                    memory: 1Gi
 ```
 ```yaml
 kind: Pod
 metadata:
    name: Pod4
 spec:
    containers:
        name: foo
            resources:
                limits:
                    cpu: 20m
                    memory: 2Gi
                requests:
                    cpu: 10m
                    memory: 1Gi  
 ```
 Pod3 and Pod4 are both classified as `Bu` and are hence nested under the Bu cgroup
 And for `qom` = 0
 ```
 /Bu/cpu.shares = 30m  
 /Bu/Pod3/cpu.quota = 150m  
 /Bu/Pod3/cpu.shares = 20m  
 /Bu/Pod4/cpu.quota = 20m  
 /Bu/Pod4/cpu.shares = 10m  
 /Bu/memory.limit_in_bytes = Allocatable - 5Gi  
 /Bu/Pod3/memory.limit_in_bytes = 3Gi  
 /Bu/Pod4/memory.limit_in_bytes = 2Gi  
 ```
 #### Best Effort
 For pods belonging to the `BE` QoS we don't set any quota.
 ```
 /BE/cpu.shares = 2  
 /BE/cpu.quota= not set  
 /BE/memory.limit_in_bytes = Allocatable - {(summation of memory requests of all `G` and `Bu` pods)*(1-qom/100)}
 /BE/PodUID/memory.limit_in_bytes = no limit  
 ```
 Example:
 We have a pod 'Pod5' having Pod Spec given below:
 ```yaml
 kind: Pod
 metadata:
    name: Pod5
 spec:
    containers:
        name: foo
            resources:
        name: bar
            resources:
 ```
 Pod5 is classified as `BE` and is hence nested under the BE cgroup
 And for `qom` = 0
 ```
 /BE/cpu.shares = 2  
 /BE/cpu.quota= not set  
 /BE/memory.limit_in_bytes = Allocatable - 7Gi  
 /BE/Pod5/memory.limit_in_bytes = no limit  
 ```
 ### With Systemd
 In systemd we have slices for the three top level QoS class. Further each pod is a subslice of exactly one of the three QoS slices. Each container in a pod belongs to a scope nested under the qosclass-pod slice.
 Example:  We plan to have the following cgroup hierarchy on systemd systems
 ```
 /memory/G-PodUID.slice/containerUID.scope
 /cpu,cpuacct/G-PodUID.slice/containerUID.scope
 /memory/Bu.slice/Bu-PodUID.slice/containerUID.scope
 /cpu,cpuacct/Bu.slice/Bu-PodUID.slice/containerUID.scope
 /memory/BE.slice/BE-PodUID.slice/containerUID.scope
 /cpu,cpuacct/BE.slice/BE-PodUID.slice/containerUID.scope
 ```
 ### Hierarchy Outline
 - "$Root" is the system root of the node i.e. "/" by default and if `--cgroup-root` is specified then the specified cgroup-root is used as "$Root".
 - We have a top level QoS cgroup for the `Bu` and `BE` QoS classes.
 - But we __dont__ have a separate cgroup for the `G` QoS class. `G` pod cgroups are brought up directly under the `Root` cgroup.
 - Each pod has its own cgroup which is nested under the cgroup matching the pod's QoS class.
 - All containers brought up by the pod are nested under the pod's cgroup.
 - system-reserved cgroup contains the system specific processes.
 - kube-reserved cgroup contains the kubelet specific daemons.
 ```
 $ROOT
  |
  +- Pod1
  |   |
  |   +- Container1
  |   +- Container2
  |   ...
  +- Pod2
  |   +- Container3
  |   ...
  +- ...
  |
  +- Bu
  |   |
  |   +- Pod3
  |   |   |
  |   |   +- Container4
  |   |   ...
  |   +- Pod4
  |   |   +- Container5
  |   |   ...
  |   +- ...
  |
  +- BE
  |   |
  |   +- Pod5
  |   |   |
  |   |   +- Container6
  |   |   +- Container7
  |   |   ...
  |   +- ...
  |
  +- System-reserved
  |   |
  |   +- system
  |   +- docker (optional)
  |   +- ...
  |
  +- Kube-reserved 
  |   |
  |   +- kubelet
  |   +- docker (optional)
  |   +- ...
  |
 ```
 #### QoS Policy Design Decisions
 - This hierarchy highly prioritizes resource guarantees to the `G` over `Bu` and `BE` pods.
 - By not having a separate cgroup for the `G` class, the hierarchy allows the `G` pods to burst and utilize all of Node's Allocatable capacity.
 - The `BE` and `Bu` pods are strictly restricted from bursting and hogging resources and thus `G` Pods are guaranteed resource isolation.
 - `BE` pods are treated as lowest priority. So for the `BE` QoS cgroup we set cpu shares to the lowest possible value ie.2. This ensures that the `BE` containers get a relatively small share of cpu time.
 - Also we don't set any quota on the cpu resources as the containers on the `BE` pods can use any amount of free resources on the node.
 - Having memory limit of `BE` cgroup as (Allocatable - summation of memory requests of `G` and `Bu` pods) would result in `BE` pods becoming more susceptible to being OOM killed. As more `G` and `Bu` pods are scheduled kubelet will more likely kill `BE` pods, even if the `G` and `Bu` pods are using less than their request since we will be dynamically reducing the size of `BE` m.limit_in_bytes. But this allows for better memory guarantees to the `G` and `Bu` pods.
 ## Implementation Plan
 The implementation plan is outlined in the next sections.
 We will have a 'experimental-cgroups-per-qos' flag to specify if the user wants to use the QoS based cgroup hierarchy. The flag would be set to false by default at least in v1.5.
 #### Top level Cgroups for QoS tiers
 Two top level cgroups for `Bu` and `BE` QoS classes are created when Kubelet starts to run on a node. All `G` pods cgroups are by default nested under the `Root`. So we dont create a top level cgroup for the `G` class. For raw cgroup systems we would use libcontainers cgroups manager for general cgroup management(cgroup creation/destruction). But for systemd we don't have equivalent support for slice management in libcontainer yet. So we will be adding support for the same in the Kubelet. These cgroups are only created once on Kubelet initialization as a part of node setup. Also on systemd these cgroups are transient units and will not survive reboot.
 #### Pod level Cgroup creation and deletion (Docker runtime)
 - When a new pod is brought up, its QoS class is firstly determined.
 - We add an interface to Kubelet’s ContainerManager to create and delete pod level cgroups under the cgroup that matches the pod’s QoS class.
 - This interface will be pluggable. Kubelet will support both systemd and raw cgroups based __cgroup__ drivers. We will be using the --cgroup-driver flag proposed in the [Systemd Node Spec](kubelet-systemd.md) to specify the cgroup driver.
 - We inject creation and deletion of pod level cgroups into the pod workers.
 - As new pods are added QoS class cgroup parameters are updated to match the resource requests by the Pod.
 #### Container level cgroups
 Have docker manager create container cgroups under pod level cgroups. With the docker runtime, we will pass --cgroup-parent using the syntax expected for the corresponding cgroup-driver the runtime was configured to use.
 #### Rkt runtime
 We want to have rkt create pods under a root QoS class that kubelet specifies, and set pod level cgroup parameters mentioned in this proposal by itself.
 #### Add Pod level metrics to Kubelet's metrics provider
 Update Kubelet’s metrics provider to include Pod level metrics. Use cAdvisor's cgroup subsystem information to determine various Pod level usage metrics.
 `Note: Changes to cAdvisor might be necessary.`
 ## Rollout Plan
 This feature will be opt-in in v1.4 and an opt-out in v1.5. We recommend users to drain their nodes and opt-in, before switching to v1.5, which will result in a no-op when v1.5 kubelet is rolled out.
 ## Implementation Status
 The implementation goals of the first milestone are outlined below.
 - [x] Finalize and submit Pod Resource Management proposal for the project #26751
 - [x] Refactor qos package to be used globally throughout the codebase #27749 #28093
 - [x] Add interfaces for CgroupManager and CgroupManagerImpl which implements the CgroupManager interface and creates, destroys/updates cgroups using the libcontainer cgroupfs driver. #27755 #28566
 - [x] Inject top level QoS Cgroup creation in the Kubelet and add e2e tests to test that behaviour. #27853
 - [x] Add PodContainerManagerImpl Create and Destroy methods which implements the respective PodContainerManager methods using a cgroupfs driver. #28017
 - [x] Have docker manager create container cgroups under pod level cgroups. Inject creation and deletion of pod cgroups into the pod workers. Add e2e tests to test this behaviour. #29049
 - [x] Add support for updating policy for the pod cgroups. Add e2e tests to test this behaviour. #29087
 - [ ] Enabling 'cgroup-per-qos' flag in Kubelet: The user is expected to drain the node and restart it before enabling this feature, but as a fallback we also want to allow the user to just restart the kubelet with the cgroup-per-qos flag enabled to use this feature. As a part of this we need to figure out a policy for pods having Restart Policy: Never. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29946).
 - [ ] Removing terminated pod's Cgroup : We need to cleanup the pod's cgroup once the pod is terminated. More details in this [issue](https://github.com/kubernetes/kubernetes/issues/29927).
 - [ ] Kubelet needs to ensure that the cgroup settings are what the kubelet expects them to be. If security is not of concern, one can assume that once kubelet applies cgroups setting successfully, the values will never change unless kubelet changes it. If security is of concern, then kubelet will have to ensure that the cgroup values meet its requirements and then continue to watch for updates to cgroups via inotify and re-apply cgroup values if necessary.
 Updating QoS limits needs to happen before pod cgroups values are updated. When pod cgroups are being deleted, QoS limits have to be updated after pod cgroup values have been updated for deletion or pod cgroups have been removed. Given that kubelet doesn't have any checkpoints and updates to QoS and pod cgroups are not atomic, kubelet needs to reconcile cgroups status whenever it restarts to ensure that the cgroups values match kubelet's expectation.
 - [ ] [TEST] Opting in for this feature and rollbacks should be accompanied by detailed error message when killing pod intermittently.
 - [ ] Add a systemd implementation for Cgroup Manager interface
 Other smaller work items that we would be good to have before the release of this feature.
 - [ ] Add Pod UID to the downward api which will help simplify the e2e testing logic.
 - [ ] Check if parent cgroup exist and error out if they don’t.
 - [ ] Set top level cgroup limit to resource allocatable until we support QoS level cgroup updates. If cgroup root is not `/` then set node resource allocatable as the cgroup resource limits on cgroup root.
 - [ ] Add a NodeResourceAllocatableProvider which returns the amount of allocatable resources on the nodes. This interface would be used both by the Kubelet and ContainerManager.
 - [ ] Add top level feasibility check to ensure that pod can be admitted on the node by estimating left over resources on the node.
 - [ ] Log basic cgroup management ie. creation/deletion metrics
 To better support our requirements we needed to make some changes/add features to Libcontainer as well
 - [x] Allowing or denying all devices by writing 'a' to devices.allow or devices.deny is
 not possible once the device cgroups has children. Libcontainer doesn’t have the option of skipping updates on parent devices cgroup. opencontainers/runc/pull/958
 - [x] To use libcontainer for creating and managing cgroups in the Kubelet, I would like to just create a cgroup with no pid attached and if need be apply a pid to the cgroup later on. But libcontainer did not support cgroup creation without attaching a pid. opencontainers/runc/pull/956
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-resource-management.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/pod-security-context.md
+++ b/docs/proposals/pod-security-context.md
@ -1,374 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-security-context.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-security-context.md)
 A proposal for refactoring `SecurityContext` to have pod-level and container-level attributes in
 order to correctly model pod- and container-level security concerns.
 ## Motivation
 Currently, containers have a `SecurityContext` attribute which contains information about the
 security settings the container uses.  In practice, many of these attributes are uniform across all
 containers in a pod.  Simultaneously, there is also a need to apply the security context pattern
 at the pod level to correctly model security attributes that apply only at a pod level.
 Users should be able to:
 1.  Express security settings that are applicable to the entire pod
 2.  Express base security settings that apply to all containers
 3.  Override only the settings that need to be differentiated from the base in individual
    containers
 This proposal is a dependency for other changes related to security context:
 1.  [Volume ownership management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/12944)
 2.  [Generic SELinux label management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/14192)
 Goals of this design:
 1.  Describe the use cases for which a pod-level security context is necessary
 2.  Thoroughly describe the API backward compatibility issues that arise from the introduction of
    a pod-level security context
 3.  Describe all implementation changes necessary for the feature
 ## Constraints and assumptions
 1.  We will not design for intra-pod security; we are not currently concerned about isolating
    containers in the same pod from one another
 1.  We will design for backward compatibility with the current V1 API
 ## Use Cases
 1.  As a developer, I want to correctly model security attributes which belong to an entire pod
 2.  As a user, I want to be able to specify container attributes that apply to all containers
    without repeating myself
 3.  As an existing user, I want to be able to use the existing container-level security API
 ### Use Case: Pod level security attributes
 Some security attributes make sense only to model at the pod level.  For example, it is a
 fundamental property of pods that all containers in a pod share the same network namespace.
 Therefore, using the host namespace makes sense to model at the pod level only, and indeed, today
 it is part of the `PodSpec`.  Other host namespace support is currently being added and these will
 also be pod-level settings; it makes sense to model them as a pod-level collection of security
 attributes.
 ## Use Case: Override pod security context for container
 Some use cases require the containers in a pod to run with different security settings.  As an
 example, a user may want to have a pod with two containers, one of which runs as root with the
 privileged setting, and one that runs as a non-root UID.  To support use cases like this, it should
 be possible to override appropriate (i.e., not intrinsically pod-level) security settings for
 individual containers.
 ## Proposed Design
 ### SecurityContext
 For posterity and ease of reading, note the current state of `SecurityContext`:
 ```go
 package api
 type Container struct {
    // Other fields omitted
    // Optional: SecurityContext defines the security options the pod should be run with
    SecurityContext *SecurityContext `json:"securityContext,omitempty"`
 }
 type SecurityContext struct {
    // Capabilities are the capabilities to add/drop when running the container
    Capabilities *Capabilities `json:"capabilities,omitempty"`
    // Run the container in privileged mode
    Privileged *bool `json:"privileged,omitempty"`
    // SELinuxOptions are the labels to be applied to the container
    // and volumes
    SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`
    // RunAsUser is the UID to run the entrypoint of the container process.
    RunAsUser *int64 `json:"runAsUser,omitempty"`
    // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
    // field is not explicitly set then the kubelet may check the image for a specified user or
    // perform defaulting to specify a user.
    RunAsNonRoot bool `json:"runAsNonRoot,omitempty"`
 }
 // SELinuxOptions contains the fields that make up the SELinux context of a container.
 type SELinuxOptions struct {
    // SELinux user label
    User string `json:"user,omitempty"`
    // SELinux role label
    Role string `json:"role,omitempty"`
    // SELinux type label
    Type string `json:"type,omitempty"`
    // SELinux level label.
    Level string `json:"level,omitempty"`
 }
 ```
 ### PodSecurityContext
 `PodSecurityContext` specifies two types of security attributes:
 1.  Attributes that apply to the pod itself
 2.  Attributes that apply to the containers of the pod
 In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network
 namespaces are relocated to this type:
 ```go
 package api
 type PodSpec struct {
    // Other fields omitted
    // Optional: SecurityContext specifies pod-level attributes and container security attributes
    // that apply to all containers.
    SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
 }
 // PodSecurityContext specifies security attributes of the pod and container attributes that apply
 // to all containers of the pod.
 type PodSecurityContext struct {
    // Use the host's network namespace. If this option is set, the ports that will be
    // used must be specified.
    // Optional: Default to false.
    HostNetwork bool
    // Use the host's IPC namespace
    HostIPC bool
    // Use the host's PID namespace
    HostPID bool
    // Capabilities are the capabilities to add/drop when running containers
    Capabilities *Capabilities `json:"capabilities,omitempty"`
    // Run the container in privileged mode
    Privileged *bool `json:"privileged,omitempty"`
    // SELinuxOptions are the labels to be applied to the container
    // and volumes
    SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`
    // RunAsUser is the UID to run the entrypoint of the container process.
    RunAsUser *int64 `json:"runAsUser,omitempty"`
    // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
    // field is not explicitly set then the kubelet may check the image for a specified user or
    // perform defaulting to specify a user.
    RunAsNonRoot bool
 }
 // Comments and generated docs will change for the container.SecurityContext field to indicate
 // the precedence of these fields over the pod-level ones.
 type Container struct {
    // Other fields omitted
    // Optional: SecurityContext defines the security options the pod should be run with.
    // Settings specified in this field take precedence over the settings defined in
    // pod.Spec.SecurityContext.
    SecurityContext *SecurityContext `json:"securityContext,omitempty"`
 }
 ```
 In the V1 API, the pod-level security attributes which are currently fields of the `PodSpec` are
 retained on the `PodSpec` for backward compatibility purposes:
 ```go
 package v1
 type PodSpec struct {
    // Other fields omitted
    // Use the host's network namespace. If this option is set, the ports that will be
    // used must be specified.
    // Optional: Default to false.
    HostNetwork bool `json:"hostNetwork,omitempty"`
    // Use the host's pid namespace.
    // Optional: Default to false.
    HostPID bool `json:"hostPID,omitempty"`
    // Use the host's ipc namespace.
    // Optional: Default to false.
    HostIPC bool `json:"hostIPC,omitempty"`
    // Optional: SecurityContext specifies pod-level attributes and container security attributes
    // that apply to all containers.
    SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
 }
 ```
 The `pod.Spec.SecurityContext` specifies the security context of all containers in the pod.
 The containers' `securityContext` field is overlaid on the base security context to determine the
 effective security context for the container.
 The new V1 API should be backward compatible with the existing API.  Backward compatibility is
 defined as:
 > 1.  Any API call (e.g. a structure POSTed to a REST endpoint) that worked before your change must
 >     work the same after your change.
 > 2.  Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when
 >     issued against servers that do not include your change.
 > 3.  It must be possible to round-trip your change (convert to different API versions and back) with
 >     no loss of information.
 Previous versions of this proposal attempted to deal with backward compatibility by defining
 the affect of setting the pod-level fields on the container-level fields.  While trying to find
 consensus on this design, it became apparent that this approach was going to be extremely complex
 to implement, explain, and support.  Instead, we will approach backward compatibility as follows:
 1.  Pod-level and container-level settings will not affect one another
 2.  Old clients will be able to use container-level settings in the exact same way
 3.  Container level settings always override pod-level settings if they are set
 #### Examples
 1.  Old client using `pod.Spec.Containers[x].SecurityContext`
    An old client creates a pod:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
        securityContext:
          runAsUser: 1001
      - name: b
        securityContext:
          runAsUser: 1002
    ```
    looks to old clients like:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
        securityContext:
          runAsUser: 1001
      - name: b
        securityContext:
          runAsUser: 1002
    ```
    looks to new clients like:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
        securityContext:
          runAsUser: 1001
      - name: b
        securityContext:
          runAsUser: 1002
    ```
 2.  New client using `pod.Spec.SecurityContext`
    A new client creates a pod using a field of `pod.Spec.SecurityContext`:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      securityContext:
        runAsUser: 1001
      containers:
      - name: a
      - name: b
    ```
    appears to new clients as:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      securityContext:
        runAsUser: 1001
      containers:
      - name: a
      - name: b
    ```
    old clients will see:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: a
      - name: b
    ```
 3.  Pods created using `pod.Spec.SecurityContext` and `pod.Spec.Containers[x].SecurityContext`
    If a field is set in both `pod.Spec.SecurityContext` and
    `pod.Spec.Containers[x].SecurityContext`, the value in `pod.Spec.Containers[x].SecurityContext`
    wins.  In the following pod:
    ```yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      securityContext:
        runAsUser: 1001
      containers:
      - name: a
        securityContext:
          runAsUser: 1002
      - name: b
    ```
    The effective setting for `runAsUser` for container A is `1002`.
 #### Testing
 A backward compatibility test suite will be established for the v1 API.  The test suite will
 verify compatibility by converting objects into the internal API and back to the version API and
 examining the results.
 All of the examples here will be used as test-cases.  As more test cases are added, the proposal will
 be updated.
 An example of a test like this can be found in the
 [OpenShift API package](https://github.com/openshift/origin/blob/master/pkg/api/compatibility_test.go)
 E2E test cases will be added to test the correct determination of the security context for containers.
 ### Kubelet changes
 1.  The Kubelet will use the new fields on the `PodSecurityContext` for host namespace control
 2.  The Kubelet will be modified to correctly implement the backward compatibility and effective
    security context determination defined here
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-security-context.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/protobuf.md
+++ b/docs/proposals/protobuf.md
@ -1,480 +1 @@
-# Protobuf serialization and internal storage
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/protobuf.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/protobuf.md)
@smarterclayton
 March 2016
 ## Proposal and Motivation
 The Kubernetes API server is a "dumb server" which offers storage, versioning,
 validation, update, and watch semantics on API resources. In a large cluster
 the API server must efficiently retrieve, store, and deliver large numbers
 of coarse-grained objects to many clients. In addition, Kubernetes traffic is
 heavily biased towards intra-cluster traffic - as much as 90% of the requests
 served by the APIs are for internal cluster components like nodes, controllers,
 and proxies. The primary format for intercluster API communication is JSON
 today for ease of client construction.
 At the current time, the latency of reaction to change in the cluster is
 dominated by the time required to load objects from persistent store (etcd),
 convert them to an output version, serialize them JSON over the network, and
 then perform the reverse operation in clients. The cost of
 serialization/deserialization and the size of the bytes on the wire, as well
 as the memory garbage created during those operations, dominate the CPU and
 network usage of the API servers.
 In order to reach clusters of 10k nodes, we need roughly an order of magnitude
 efficiency improvement in a number of areas of the cluster, starting with the
 masters but also including API clients like controllers, kubelets, and node
 proxies.
 We propose to introduce a Protobuf serialization for all common API objects
 that can optionally be used by intra-cluster components. Experiments have
 demonstrated a 10x reduction in CPU use during serialization and deserialization,
 a 2x reduction in size in bytes on the wire, and a 6-9x reduction in the amount
 of objects created on the heap during serialization. The Protobuf schema
 for each object will be automatically generated from the external API Go structs
 we use to serialize to JSON.
 Benchmarking showed that the time spent on the server in a typical GET
 resembles:
          etcd -> decode -> defaulting -> convert to internal ->
    JSON          50us      5us           15us
    Proto         5us
    JSON          150allocs               80allocs
    Proto         100allocs
          process -> convert to external -> encode -> client
    JSON             15us                   40us
    Proto                                   5us
    JSON             80allocs               100allocs
    Proto                                   4allocs
 Protobuf has a huge benefit on encoding because it does not need to allocate
 temporary objects, just one large buffer. Changing to protobuf moves our
 hotspot back to conversion, not serialization.
 ## Design Points
 * Generate Protobuf schema from Go structs (like we do for JSON) to avoid
  manual schema update and drift
 * Generate Protobuf schema that is field equivalent to the JSON fields (no
  special types or enumerations), reducing drift for clients across formats.
 * Follow our existing API versioning rules (backwards compatible in major
  API versions, breaking changes across major versions) by creating one
  Protobuf schema per API type.
 * Continue to use the existing REST API patterns but offer an alternative
  serialization, which means existing client and server tooling can remain
  the same while benefiting from faster decoding.
 * Protobuf objects on disk or in etcd will need to be self identifying at
  rest, like JSON, in order for backwards compatibility in storage to work,
  so we must add an envelope with apiVersion and kind to wrap the nested
  object, and make the data format recognizable to clients.
 * Use the [gogo-protobuf](https://github.com/gogo/protobuf) Golang library to generate marshal/unmarshal
  operations, allowing us to bypass the expensive reflection used by the
  golang JSOn operation
 ## Alternatives
 * We considered JSON compression to reduce size on wire, but that does not
  reduce the amount of memory garbage created during serialization and
  deserialization.
 * More efficient formats like Msgpack were considered, but they only offer
  2x speed up vs. the 10x observed for Protobuf
 * gRPC was considered, but is a larger change that requires more core
  refactoring. This approach does not eliminate the possibility of switching
  to gRPC in the future.
 * We considered attempting to improve JSON serialization, but the cost of
  implementing a more efficient serializer library than ugorji is
  significantly higher than creating a protobuf schema from our Go structs.
 ## Schema
 The Protobuf schema for each API group and version will be generated from
 the objects in that API group and version. The schema will be named using
 the package identifier of the Go package, i.e.
    k8s.io/kubernetes/pkg/api/v1
 Each top level object will be generated as a Protobuf message, i.e.:
    type Pod struct { ... }
    message Pod {}
 Since the Go structs are designed to be serialized to JSON (with only the
 int, string, bool, map, and array primitive types), we will use the
 canonical JSON serialization as the protobuf field type wherever possible,
 i.e.:
    JSON      Protobuf
    string -> string
    int    -> varint
    bool   -> bool
    array  -> repeating message|primitive
 We disallow the use of the Go `int` type in external fields because it is
 ambiguous depending on compiler platform, and instead always use `int32` or
 `int64`.
 We will use maps (a protobuf 3 extension that can serialize to protobuf 2)
 to represent JSON maps:
    JSON      Protobuf            Wire (proto2)
    map    -> map<string, ...> -> repeated Message { key string; value bytes }
 We will not convert known string constants to enumerations, since that
 would require extra logic we do not already have in JSOn.
 To begin with, we will use Protobuf 3 to generate a Protobuf 2 schema, and
 in the future investigate a Protobuf 3 serialization. We will introduce
 abstractions that let us have more than a single protobuf serialization if
 necessary. Protobuf 3 would require us to support message types for
 pointer primitive (nullable) fields, which is more complex than Protobuf 2's
 support for pointers.
 ### Example of generated proto IDL
 Without gogo extensions:
 ```
 syntax = 'proto2';
 package k8s.io.kubernetes.pkg.api.v1;
 import "k8s.io/kubernetes/pkg/api/resource/generated.proto";
 import "k8s.io/kubernetes/pkg/api/unversioned/generated.proto";
 import "k8s.io/kubernetes/pkg/runtime/generated.proto";
 import "k8s.io/kubernetes/pkg/util/intstr/generated.proto";
 // Package-wide variables from generator "generated".
 option go_package = "v1";
 // Represents a Persistent Disk resource in AWS.
 //
 // An AWS EBS disk must exist before mounting to a container. The disk
 // must also be in the same AWS zone as the kubelet. An AWS EBS disk
 // can only be mounted as read/write once. AWS EBS volumes support
 // ownership management and SELinux relabeling.
 message AWSElasticBlockStoreVolumeSource {
  // Unique ID of the persistent disk resource in AWS (Amazon EBS volume).
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  optional string volumeID = 1;
  // Filesystem type of the volume that you want to mount.
  // Tip: Ensure that the filesystem type is supported by the host operating system.
  // Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  // TODO: how do we prevent errors in the filesystem from compromising the machine
  optional string fsType = 2;
  // The partition in the volume that you want to mount.
  // If omitted, the default is to mount by volume name.
  // Examples: For volume /dev/sda1, you specify the partition as "1".
  // Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
  optional int32 partition = 3;
  // Specify "true" to force and set the ReadOnly property in VolumeMounts to "true".
  // If omitted, the default is "false".
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  optional bool readOnly = 4;
 }
 // Affinity is a group of affinity scheduling rules, currently
 // only node affinity, but in the future also inter-pod affinity.
 message Affinity {
  // Describes node affinity scheduling rules for the pod.
  optional NodeAffinity nodeAffinity = 1;
 }
 ```
 With extensions:
 ```
 syntax = 'proto2';
 package k8s.io.kubernetes.pkg.api.v1;
 import "github.com/gogo/protobuf/gogoproto/gogo.proto";
 import "k8s.io/kubernetes/pkg/api/resource/generated.proto";
 import "k8s.io/kubernetes/pkg/api/unversioned/generated.proto";
 import "k8s.io/kubernetes/pkg/runtime/generated.proto";
 import "k8s.io/kubernetes/pkg/util/intstr/generated.proto";
 // Package-wide variables from generator "generated".
 option (gogoproto.marshaler_all) = true;
 option (gogoproto.sizer_all) = true;
 option (gogoproto.unmarshaler_all) = true;
 option (gogoproto.goproto_unrecognized_all) = false;
 option (gogoproto.goproto_enum_prefix_all) = false;
 option (gogoproto.goproto_getters_all) = false;
 option go_package = "v1";
 // Represents a Persistent Disk resource in AWS.
 //
 // An AWS EBS disk must exist before mounting to a container. The disk
 // must also be in the same AWS zone as the kubelet. An AWS EBS disk
 // can only be mounted as read/write once. AWS EBS volumes support
 // ownership management and SELinux relabeling.
 message AWSElasticBlockStoreVolumeSource {
  // Unique ID of the persistent disk resource in AWS (Amazon EBS volume).
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  optional string volumeID = 1 [(gogoproto.customname) = "VolumeID", (gogoproto.nullable) = false];
  // Filesystem type of the volume that you want to mount.
  // Tip: Ensure that the filesystem type is supported by the host operating system.
  // Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  // TODO: how do we prevent errors in the filesystem from compromising the machine
  optional string fsType = 2 [(gogoproto.customname) = "FSType", (gogoproto.nullable) = false];
  // The partition in the volume that you want to mount.
  // If omitted, the default is to mount by volume name.
  // Examples: For volume /dev/sda1, you specify the partition as "1".
  // Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
  optional int32 partition = 3 [(gogoproto.customname) = "Partition", (gogoproto.nullable) = false];
  // Specify "true" to force and set the ReadOnly property in VolumeMounts to "true".
  // If omitted, the default is "false".
  // More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
  optional bool readOnly = 4 [(gogoproto.customname) = "ReadOnly", (gogoproto.nullable) = false];
 }
 // Affinity is a group of affinity scheduling rules, currently
 // only node affinity, but in the future also inter-pod affinity.
 message Affinity {
  // Describes node affinity scheduling rules for the pod.
  optional NodeAffinity nodeAffinity = 1 [(gogoproto.customname) = "NodeAffinity"];
 }
 ```
 ## Wire format
 In order to make Protobuf serialized objects recognizable in a binary form,
 the encoded object must be prefixed by a magic number, and then wrap the
 non-self-describing Protobuf object in a Protobuf object that contains
 schema information.  The protobuf object is referred to as the `raw` object
 and the encapsulation is referred to as `wrapper` object.
 The simplest serialization is the raw Protobuf object with no identifying
 information. In some use cases, we may wish to have the server identify the
 raw object type on the wire using a protocol dependent format (gRPC uses
 a type HTTP header). This works when all objects are of the same type, but
 we occasionally have reasons to encode different object types in the same
 context (watches, lists of objects on disk, and API calls that may return
 errors).
 To identify the type of a wrapped Protobuf object, we wrap it in a message
 in package `k8s.io/kubernetes/pkg/runtime` with message name `Unknown`
 having the following schema:
    message Unknown {
      optional TypeMeta typeMeta = 1;
      optional bytes value = 2;
      optional string contentEncoding = 3;
      optional string contentType = 4;
    }
    message TypeMeta {
      optional string apiVersion = 1;
      optional string kind = 2;
    }
 The `value` field is an encoded protobuf object that matches the schema
 defined in `typeMeta` and has optional `contentType` and `contentEncoding`
 fields.  `contentType` and `contentEncoding` have the same meaning as in
 HTTP, if unspecified `contentType` means "raw protobuf object", and
 `contentEncoding` defaults to no encoding. If `contentEncoding` is
 specified, the defined transformation should be applied to `value` before
 attempting to decode the value.
 The `contentType` field is required to support objects without a defined
 protobuf schema, like the ThirdPartyResource or templates. Those objects
 would have to be encoded as JSON or another structure compatible form
 when used with Protobuf. Generic clients must deal with the possibility
 that the returned value is not in the known type.
 We add the `contentEncoding` field here to preserve room for future
 optimizations like encryption-at-rest or compression of the nested content.
 Clients should error when receiving an encoding they do not support.
 Negotioting encoding is not defined here, but introducing new encodings
 is similar to introducing a schema change or new API version.
 A client should use the `kind` and `apiVersion` fields to identify the
 correct protobuf IDL for that message and version, and then decode the
 `bytes` field into that Protobuf message.
 Any Unknown value written to stable storage will be given a 4 byte prefix
 `0x6b, 0x38, 0x73, 0x00`, which correspond to `k8s` followed by a zero byte.
 The content-type `application/vnd.kubernetes.protobuf` is defined as
 representing the following schema:
    MESSAGE = '0x6b 0x38 0x73 0x00' UNKNOWN
    UNKNOWN = <protobuf serialization of k8s.io/kubernetes/pkg/runtime#Unknown>
 A client should check for the first four bytes, then perform a protobuf
 deserialization of the remaining bytes into the `runtime.Unknown` type.
 ## Streaming wire format
 While the majority of Kubernetes APIs return single objects that can vary
 in type (Pod vs. Status, PodList vs. Status), the watch APIs return a stream
 of identical objects (Events). At the time of this writing, this is the only
 current or anticipated streaming RESTful protocol (logging, port-forwarding,
 and exec protocols use a binary protocol over Websockets or SPDY).
 In JSON, this API is implemented as a stream of JSON objects that are
 separated by their syntax (the closing `}` brace is followed by whitespace
 and the opening `{` brace starts the next object). There is no formal
 specification covering this pattern, nor a unique content-type. Each object
 is expected to be of type `watch.Event`, and is currently not self describing.
 For expediency and consistency, we define a format for Protobuf watch Events
 that is similar. Since protobuf messages are not self describing, we must
 identify the boundaries between Events (a `frame`). We do that by prefixing
 each frame of N bytes with a 4-byte, big-endian, unsigned integer with the
 value N.
    frame  = length body
    length = 32-bit unsigned integer in big-endian order, denoting length of
             bytes of body
    body = <bytes>
    # frame containing a single byte 0a
    frame = 01 00 00 00 0a
    # equivalent JSON
    frame = {"type": "added", ...}
 The body of each frame is a serialized Protobuf message `Event` in package
 `k8s.io/kubernetes/pkg/watch/versioned`. The content type used for this
 format is `application/vnd.kubernetes.protobuf;type=watch`.
 ## Negotiation
 To allow clients to request protobuf serialization optionally, the `Accept`
 HTTP header is used by callers to indicate which serialization they wish
 returned in the response, and the `Content-Type` header is used to tell the
 server how to decode the bytes sent in the request (for DELETE/POST/PUT/PATCH
 requests). The server will return 406 if the `Accept` header is not
 recognized or 415 if the `Content-Type` is not recognized (as defined in
 RFC2616).
 To be backwards compatible, clients must consider that the server does not
 support protobuf serialization. A number of options are possible:
 ### Preconfigured
 Clients can have a configuration setting that instructs them which version
 to use. This is the simplest option, but requires intervention when the
 component upgrades to protobuf.
 ### Include serialization information in api-discovery
 Servers can define the list of content types they accept and return in
 their API discovery docs, and clients can use protobuf if they support it.
 Allows dynamic configuration during upgrade if the client is already using
 API-discovery.
 ### Optimistically attempt to send and receive requests using protobuf
 Using multiple `Accept` values:
    Accept: application/vnd.kubernetes.protobuf, application/json
 clients can indicate their preferences and handle the returned
 `Content-Type` using whatever the server responds. On update operations,
 clients can try protobuf and if they receive a 415 error, record that and
 fall back to JSON. Allows the client to be backwards compatible with
 any server, but comes at the cost of some implementation complexity.
 ## Generation process
 Generation proceeds in five phases:
 1. Generate a gogo-protobuf annotated IDL from the source Go struct.
 2. Generate temporary Go structs from the IDL using gogo-protobuf.
 3. Generate marshaller/unmarshallers based on the IDL using gogo-protobuf.
 4. Take all tag numbers generated for the IDL and apply them as struct tags
   to the original Go types.
 5. Generate a final IDL without gogo-protobuf annotations as the canonical IDL.
 The output is a `generated.proto` file in each package containing a standard
 proto2 IDL, and a `generated.pb.go` file in each package that contains the
 generated marshal/unmarshallers.
 The Go struct generated by gogo-protobuf from the first IDL must be identical
 to the origin struct - a number of changes have been made to gogo-protobuf
 to ensure exact 1-1 conversion. A small number of additions may be necessary
 in the future if we introduce more exotic field types (Go type aliases, maps
 with aliased Go types, and embedded fields were fixed). If they are identical,
 the output marshallers/unmarshallers can then work on the origin struct.
 Whenever a new field is added, generation will assign that field a unique tag
 and the 4th phase will write that tag back to the origin Go struct as a `protobuf`
 struct tag. This ensures subsequent generation passes are stable, even in the
 face of internal refactors. The first time a field is added, the author will
 need to check in both the new IDL AND the protobuf struct tag changes.
 The second IDL is generated without gogo-protobuf annotations to allow clients
 in other languages to generate easily.
 Any errors in the generation process are considered fatal and must be resolved
 early (being unable to identify a field type for conversion, duplicate fields,
 duplicate tags, protoc errors, etc). The conversion fuzzer is used to ensure
 that a Go struct can be round-tripped to protobuf and back, as we do for JSON
 and conversion testing.
 ## Changes to development process
 All existing API change rules would still apply. New fields added would be
 automatically assigned a tag by the generation process. New API versions will
 have a new proto IDL, and field name and changes across API versions would be
 handled using our existing API change rules. Tags cannot change within an
 API version.
 Generation would be done by developers and then checked into source control,
 like conversions and ugorji JSON codecs.
 Because protoc is not packaged well across all platforms, we will add it to
 the `kube-cross` Docker image and developers can use that to generate
 updated protobufs. Protobuf 3 beta is required.
 The generated protobuf will be checked with a verify script before merging.
 ## Implications
 * The generated marshal code is large and will increase build times and binary
  size. We may be able to remove ugorji after protobuf is added, since the
  bulk of our decoding would switch to protobuf.
 * The protobuf schema is naive, which means it may not be as a minimal as
  possible.
 * Debugging of protobuf related errors is harder due to the binary nature of
  the format.
 * Migrating API object storage from JSON to protobuf will require that all
  API servers are upgraded before beginning to write protobuf to disk, since
  old servers won't recognize protobuf.
 * Transport of protobuf between etcd and the api server will be less efficient
  in etcd2 than etcd3 (since etcd2 must encode binary values returned as JSON).
  Should still be smaller than current JSON request.
 * Third-party API objects must be stored as JSON inside of a protobuf wrapper
  in etcd, and the API endpoints will not benefit from clients that speak
  protobuf. Clients will have to deal with some API objects not supporting
  protobuf.
 ## Open Questions
 * Is supporting stored protobuf files on disk in the kubectl client worth it?
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/protobuf.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/release-notes.md
+++ b/docs/proposals/release-notes.md
@ -1,194 +1 @@
-
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release-notes.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/release-notes.md)
 # Kubernetes Release Notes
 [djmm@google.com](mailto:djmm@google.com)<BR>
 Last Updated: 2016-04-06
 <!-- BEGIN MUNGE: GENERATED_TOC -->
 - [Kubernetes Release Notes](#kubernetes-release-notes)
  - [Objective](#objective)
  - [Background](#background)
    - [The Problem](#the-problem)
    - [The (general) Solution](#the-general-solution)
      - [Then why not just list *every* change that was submitted, CHANGELOG-style?](#then-why-not-just-list-every-change-that-was-submitted-changelog-style)
  - [Options](#options)
  - [Collection Design](#collection-design)
  - [Publishing Design](#publishing-design)
    - [Location](#location)
    - [Layout](#layout)
      - [Alpha/Beta/Patch Releases](#alphabetapatch-releases)
      - [Major/Minor Releases](#majorminor-releases)
  - [Work estimates](#work-estimates)
  - [Caveats / Considerations](#caveats--considerations)
 <!-- END MUNGE: GENERATED_TOC -->
 ## Objective
 Define a process and design tooling for collecting, arranging and publishing
 release notes for Kubernetes releases, automating as much of the process as
 possible.
 The goal is to introduce minor changes to the development workflow
 in a way that is mostly frictionless and allows for the capture of release notes
 as PRs are submitted to the repository.
 This direct association of release notes to PRs captures the intention of
 release visibility of the PR at the point an idea is submitted upstream.
 The release notes can then be more easily collected and published when the
 release is ready.
 ## Background
 ### The Problem
 Release notes are often an afterthought and clarifying and finalizing them
 is often left until the very last minute at the time the release is made.
 This is usually long after the feature or bug fix was added and is no longer on
 the mind of the author.  Worse, the collecting and summarizing of the
 release is often left to those who may know little or nothing about these
 individual changes!
 Writing and editing release notes at the end of the cycle can be a rushed,
 interrupt-driven and often stressful process resulting in incomplete,
 inconsistent release notes often with errors and omissions.
 ### The (general) Solution
 Like most things in the development/release pipeline, the earlier you do it,
 the easier it is for everyone and the better the outcome.  Gather your release
 notes earlier in the development cycle, at the time the features and fixes are
 added.
 #### Then why not just list *every* change that was submitted, CHANGELOG-style?
 On larger projects like Kubernetes, showing every single change (PR) would mean
 hundreds of entries.  The goal is to highlight the major changes for a release.
 ## Options
 1. Use of pre-commit and other local git hooks
   * Experiments here using `prepare-commit-msg` and `commit-msg` git hook files
     were promising but less than optimal due to the fact that they would
     require input/confirmation with each commit and there may be multiple
     commits in a push and eventual PR.
 1. Use of [github templates](https://github.com/blog/2111-issue-and-pull-request-templates)
   * Templates provide a great way to pre-fill PR comments, but there are no
     server-side hooks available to parse and/or easily check the contents of
     those templates to ensure that checkboxes were checked or forms were filled
     in.
 1. Use of labels enforced by mungers/bots
   * We already make great use of mungers/bots to manage labels on PRs and it
     fits very nicely in the existing workflow
 ## Collection Design
 The munger/bot option fits most cleanly into the existing workflow.
 All `release-note-*` labeling is managed on the master branch PR only.
 No `release-note-*` labels are needed on cherry-pick PRs and no information
 will be collected from that cherry-pick PR.
 The only exception to this rule is when a PR is not a cherry-pick and is
 targeted directly to the non-master branch.  In this case, a `release-note-*`
 label is required for that non-master PR.
 1. New labels added to github: `release-note-none`, maybe others for new release note categories - see Layout section below
 1. A [new munger](https://github.com/kubernetes/kubernetes/issues/23409) that will:
  * Add a `release-note-label-needed` label to all new master branch PRs
  * Block merge by the submit queue on all PRs labeled as `release-note-label-needed`
  * Auto-remove `release-note-label-needed` when one of the `release-note-*` labels is added
 ## Publishing Design
 ### Location
 With v1.2.0, the release notes were moved from their previous [github releases](https://github.com/kubernetes/kubernetes/releases)
 location to [CHANGELOG.md](../../CHANGELOG.md).  Going forward this seems like a good plan.
 Other projects do similarly.
 The kubernetes.tar.gz download link is also displayed along with the release notes
 in [CHANGELOG.md](../../CHANGELOG.md).
 Is there any reason to continue publishing anything to github releases if
 the complete release story is published in [CHANGELOG.md](../../CHANGELOG.md)?
 ### Layout
 Different types of releases will generally have different requirements in
 terms of layout.  As expected, major releases like v1.2.0 are going
 to require much more detail than the automated release notes will provide.
 The idea is that these mechanisms will provide 100% of the release note
 content for alpha, beta and most minor releases and bootstrap the content
 with a release note 'template' for the authors of major releases like v1.2.0.
 The authors can then collaborate and edit the higher level sections of the
 release notes in a PR, updating [CHANGELOG.md](../../CHANGELOG.md) as needed.
 v1.2.0 demonstrated the need, at least for major releases like v1.2.0, for
 several sections in the published release notes.
 In order to provide a basic layout for release notes in the future,
 new releases can bootstrap [CHANGELOG.md](../../CHANGELOG.md) with the following template types:
 #### Alpha/Beta/Patch Releases
 These are automatically generated from `release-note*` labels, but can be modified as needed.
 ```
 Action Required
 * PR titles from the release-note-action-required label
 Other notable changes
 * PR titles from the release-note label
 ```
 #### Major/Minor Releases
 ```
 Major Themes
 * Add to or delete this section
 Other notable improvements
 * Add to or delete this section
 Experimental Features
 * Add to or delete this section
 Action Required
 * PR titles from the release-note-action-required label
 Known Issues
 * Add to or delete this section
 Provider-specific Notes
 * Add to or delete this section
 Other notable changes
 * PR titles from the release-note label
 ```
 ## Work estimates
 * The [new munger](https://github.com/kubernetes/kubernetes/issues/23409)
  * Owner: @eparis
  * Time estimate: Mostly done
 * Updates to the tool that collects, organizes, publishes and sends release
  notifications.
  * Owner: @david-mcmahon
  * Time estimate: A few days
 ## Caveats / Considerations
 * As part of the planning and development workflow how can we capture
  release notes for bigger features?
  [#23070](https://github.com/kubernetes/kubernetes/issues/23070)
  * For now contributors should simply use the first PR that enables a new
    feature by default.  We'll revisit if this does not work well.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/release-notes.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/rescheduler.md
+++ b/docs/proposals/rescheduler.md
@ -1,123 +1 @@
-# Rescheduler design space
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduler.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduler.md)
@davidopp, @erictune, @briangrant
 July 2015
 ## Introduction and definition
 A rescheduler is an agent that proactively causes currently-running
 Pods to be moved, so as to optimize some objective function for
 goodness of the layout of Pods in the cluster. (The objective function
 doesn't have to be expressed mathematically; it may just be a
 collection of ad-hoc rules, but in principle there is an objective
 function. Implicitly an objective function is described by the
 scheduler's predicate and priority functions.) It might be triggered
 to run every N minutes, or whenever some event happens that is known
 to make the objective function worse (for example, whenever any Pod goes
 PENDING for a long time.)
 ## Motivation and use cases
 A rescheduler is useful because without a rescheduler, scheduling
 decisions are only made at the time Pods are created. But later on,
 the state of the cell may have changed in some way such that it would
 be better to move the Pod to another node.
 There are two categories of movements a rescheduler might trigger: coalescing
 and spreading.
 ### Coalesce Pods
 This is the most common use case. Cluster layout changes over time. For
 example, run-to-completion Pods terminate, producing free space in their wake, but that space
 is fragmented. This fragmentation might prevent a PENDING Pod from scheduling
 (there are enough free resource for the Pod in aggregate across the cluster,
 but not on any single node). A rescheduler can coalesce free space like a
 disk defragmenter, thereby producing enough free space on a node for a PENDING
 Pod to schedule. In some cases it can do this just by moving Pods into existing
 holes, but often it will need to evict (and reschedule) running Pods in order to
 create a large enough hole.
 A second use case for a rescheduler to coalesce pods is when it becomes possible
 to support the running Pods on a fewer number of nodes. The rescheduler can
 gradually move Pods off of some set of nodes to make those nodes empty so
 that they can then be shut down/removed. More specifically,
 the system could do a simulation to see whether after removing a node from the
 cluster, will the Pods that were on that node be able to reschedule,
 either directly or with the help of the rescheduler; if the answer is
 yes, then you can safely auto-scale down (assuming services will still
 meeting their application-level SLOs).
 ### Spread Pods
 The main use cases for spreading Pods revolve around relieving congestion on (a) highly
 utilized node(s). For example, some process might suddenly start receiving a significantly
 above-normal amount of external requests, leading to starvation of best-effort
 Pods on the node. We can use the rescheduler to move the best-effort Pods off of the
 node. (They are likely to have generous eviction SLOs, so are more likely to be movable
 than the Pod that is experiencing the higher load, but in principle we might move either.)
 Or even before any node becomes overloaded, we might proactively re-spread Pods from nodes
 with high-utilization, to give them some buffer against future utilization spikes. In either
 case, the nodes we move the Pods onto might have been in the system for a long time or might
 have been added by the cluster auto-scaler specifically to allow the rescheduler to
 rebalance utilization.
 A second spreading use case is to separate antagonists.
 Sometimes the processes running in two different Pods on the same node
 may have unexpected antagonistic
 behavior towards one another. A system component might monitor for such
 antagonism and ask the rescheduler to move one of the antagonists to a new node.
 ### Ranking the use cases
 The vast majority of users probably only care about rescheduling for three scenarios:
 1. Move Pods around to get a PENDING Pod to schedule
 1. Redistribute Pods onto new nodes added by a cluster auto-scaler when there are no PENDING Pods
 1. Move Pods around when CPU starvation is detected on a node
 ## Design considerations and design space
 Because rescheduling is disruptive--it causes one or more
 already-running Pods to die when they otherwise wouldn't--a key
 constraint on rescheduling is that it must be done subject to
 disruption SLOs. There are a number of ways to specify these SLOs--a
 global rate limit across all Pods, a rate limit across a set of Pods
 defined by some particular label selector, a maximum number of Pods
 that can be down at any one time among a set defined by some
 particular label selector, etc. These policies are presumably part of
 the Rescheduler's configuration.
 There are a lot of design possibilities for a rescheduler. To explain
 them, it's easiest to start with the description of a baseline
 rescheduler, and then describe possible modifications. The Baseline
 rescheduler
 * only kicks in when there are one or more PENDING Pods for some period of time; its objective function is binary: completely happy if there are no PENDING Pods, and completely unhappy if there are PENDING Pods; it does not try to optimize for any other aspect of cluster layout
 * is not a scheduler -- it simply identifies a node where a PENDING Pod could fit if one or more Pods on that node were moved out of the way, and then kills those Pods to make room for the PENDING Pod, which will then be scheduled there by the regular scheduler(s).  [obviously this killing operation must be able to specify "don't allow the killed Pod to reschedule back to whence it was killed" otherwise the killing is pointless] Of course it should only do this if it is sure the killed Pods will be able to reschedule into already-free space in the cluster. Note that although it is not a scheduler, the Rescheduler needs to be linked with the predicate functions of the scheduling algorithm(s) so that it can know (1) that the PENDING Pod would actually schedule into the hole it has identified once the hole is created, and (2) that the evicted Pod(s) will be able to schedule somewhere else in the cluster.
 Possible variations on this Baseline rescheduler are
 1. it can kill the Pod(s) whose space it wants **and also schedule the Pod that will take that space and reschedule the Pod(s) that were killed**, rather than just killing the Pod(s) whose space it wants and relying on the regular scheduler(s) to schedule the Pod that will take that space (and to reschedule the Pod(s) that were evicted)
 1. it can run continuously in the background to optimize general cluster layout instead of just trying to get a PENDING Pod to schedule
 1. it can try to move groups of Pods instead of using a one-at-a-time / greedy approach
 1. it can formulate multi-hop plans instead of single-hop
 A key design question for a Rescheduler is how much knowledge it needs about the scheduling policies used by the cluster's scheduler(s).
 * For the Baseline rescheduler, it needs to know the predicate functions used by the cluster's scheduler(s) else it can't know how to create a hole that the PENDING Pod will fit into, nor be sure that the evicted Pod(s) will be able to reschedule elsewhere.
 * If it is going to run continuously in the background to optimize cluster layout but is still only going to kill Pods, then it still needs to know the predicate functions for the reason mentioned above. In principle it doesn't need to know the priority functions; it could just randomly kill Pods and rely on the regular scheduler to put them back in better places. However, this is a rather inexact approach. Thus it is useful for the rescheduler to know the priority functions, or at least some subset of them, so it can be sure that an action it takes will actually improve the cluster layout.
 * If it is going to run continuously in the background to optimize cluster layout and is going to act as a scheduler rather than just killing Pods, then it needs to know the predicate functions and some compatible (but not necessarily identical) priority functions  One example of a case where "compatible but not identical" might be useful is if the main scheduler(s) has a very simple scheduling policy optimized for low scheduling latency, and the Rescheduler having a more sophisticated/optimal scheduling policy that requires more computation time. The main thing to avoid is for the scheduler(s) and rescheduler to have incompatible priority functions, as this will cause them to "fight" (though it still can't lead to an infinite loop, since the scheduler(s) only ever touches a Pod once).
 ## Appendix: Integrating rescheduler with cluster auto-scaler (scale up)
 For scaling up the cluster, a reasonable workflow might be:
 1. pod horizontal auto-scaler decides to add one or more Pods to a service, based on the metrics it is observing
 1. the Pod goes PENDING due to lack of a suitable node with sufficient resources
 1. rescheduler notices the PENDING Pod and determines that the Pod cannot schedule just by rearranging existing Pods (while respecting SLOs)
 1. rescheduler triggers cluster auto-scaler to add a node of the appropriate type for the PENDING Pod
 1. the PENDING Pod schedules onto the new node (and possibly the rescheduler also moves other Pods onto that node)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/rescheduler.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/rescheduling-for-critical-pods.md
+++ b/docs/proposals/rescheduling-for-critical-pods.md
@ -1,88 +1 @@
-# Rescheduler: guaranteed scheduling of critical addons
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling-for-critical-pods.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling-for-critical-pods.md)
 ## Motivation
 In addition to Kubernetes core components like api-server, scheduler, controller-manager running on a master machine
 there is a bunch of addons which due to various reasons have to run on a regular cluster node, not the master.
 Some of them are critical to have fully functional cluster: Heapster, DNS, UI. Users can break their cluster
 by evicting a critical addon (either manually or as a side effect of an other operation like upgrade)
 which possibly can become pending (for example when the cluster is highly utilized).
 To avoid such situation we want to have a mechanism which guarantees that
 critical addons are scheduled assuming the cluster is big enough.
 This possibly may affect other pods (including production user’s applications).
 ## Design
 Rescheduler will ensure that critical addons are always scheduled.
 In the first version it will implement only this policy, but later we may want to introduce other policies.
 It will be a standalone component running on master machine similarly to scheduler.
 Those components will share common logic (initially rescheduler will in fact import some of scheduler packages).
 ### Guaranteed scheduling of critical addons
 Rescheduler will observe critical addons
 (with annotation `scheduler.alpha.kubernetes.io/critical-pod`).
 If one of them is marked by scheduler as unschedulable (pod condition `PodScheduled` set to `false`, the reason set to `Unschedulable`)
 the component will try to find a space for the addon by evicting some pods and then the scheduler will schedule the addon.
 #### Scoring nodes
 Initially we want to choose a random node with enough capacity
 (chosen as described in [Evicting pods](rescheduling-for-critical-pods.md#evicting-pods)) to schedule given addons.
 Later we may want to introduce some heuristic:
 * minimize number of evicted pods with violation of disruption budget or shortened termination grace period
 * minimize number of affected pods by choosing a node on which we have to evict less pods
 * increase probability of scheduling of evicted pods by preferring a set of pods with the smallest total sum of requests
 * avoid nodes which are ‘non-drainable’ (according to drain logic), for example on which there is a pod which doesn’t belong to any RC/RS/Deployment
 #### Evicting pods
 There are 2 mechanism which possibly can delay a pod eviction: Disruption Budget and Termination Grace Period.
 While removing a pod we will try to avoid violating Disruption Budget, though we can’t guarantee it
 since there is a chance that it would block this operation for longer period of time.
 We will also try to respect Termination Grace Period, though without any guarantee.
 In case we have to remove a pod with termination grace period longer than 10s it will be shortened to 10s.
 The proposed order while choosing a node to schedule a critical addon and pods to remove:
 1. a node where the critical addon pod can fit after evicting only pods satisfying both
 (1) their disruption budget will not be violated by such eviction and (2) they have grace period <= 10 seconds
 1. a node where the critical addon pod can fit after evicting only pods whose disruption budget will not be violated by such eviction
 1. any node where the critical addon pod can fit after evicting some pods
 ### Interaction with Scheduler
 To avoid situation when Scheduler will schedule another pod into the space prepared for the critical addon,
 the chosen node has to be temporarily excluded from a list of nodes considered by Scheduler while making decisions.
 For this purpose the node will get a temporary
 [Taint](../../docs/design/taint-toleration-dedicated.md) “CriticalAddonsOnly”
 and each critical addon has to have defined toleration for this taint.
 After Rescheduler has no more work to do: all critical addons are scheduled or cluster is too small for them,
 all taints will be removed.
 ### Interaction with Cluster Autoscaler
 Rescheduler possibly can duplicate the responsibility of Cluster Autoscaler:
 both components are taking action when there is unschedulable pod.
 It may cause the situation when CA will add extra node for a pending critical addon
 and Rescheduler will evict some running pods to make a space for the addon.
 This situation would be rare and usually an extra node would be anyway needed for evicted pods.
 In the worst case CA will add and then remove the node.
 To not complicate architecture by introducing interaction between those 2 components we accept this overlap.
 We want to ensure that CA won’t remove nodes with critical addons by adding appropriate logic there.
 ### Rescheduler control loop
 The rescheduler control loop will be as follow:
 * while there is an unschedulable critical addon do the following:
  * choose a node on which the addon should be scheduled (as described in Evicting pods)
  * add taint to the node to prevent scheduler from using it
  * delete pods which blocks the addon from being scheduled
  * wait until scheduler will schedule the critical addon
 * if there is no more critical addons for which we can help, ensure there is no node with the taint
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/rescheduling-for-critical-pods.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/rescheduling.md
+++ b/docs/proposals/rescheduling.md
@ -1,493 +1 @@
-# Controlled Rescheduling in Kubernetes
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling.md)
 ## Overview
 Although the Kubernetes scheduler(s) try to make good placement decisions for pods,
 conditions in the cluster change over time (e.g. jobs finish and new pods arrive, nodes
 are removed due to failures or planned maintenance or auto-scaling down, nodes appear due
 to recovery after a failure or re-joining after maintenance or auto-scaling up or adding
 new hardware to a bare-metal cluster), and schedulers are not omniscient (e.g. there are
 some interactions between pods, or between pods and nodes, that they cannot predict). As
 a result, the initial node selected for a pod may turn out to be a bad match, from the
 perspective of the pod and/or the cluster as a whole, at some point after the pod has
 started running.
 Today (Kubernetes version 1.2) once a pod is scheduled to a node, it never moves unless
 it terminates on its own, is deleted by the user, or experiences some unplanned event
 (e.g. the node where it is running dies). Thus in a cluster with long-running pods, the
 assignment of pods to nodes degrades over time, no matter how good an initial scheduling
 decision the scheduler makes. This observation motivates "controlled rescheduling," a
 mechanism by which Kubernetes will "move" already-running pods over time to improve their
 placement. Controlled rescheduling is the subject of this proposal.
 Note that the term "move" is not technically accurate -- the mechanism used is that
 Kubernetes will terminate a pod that is managed by a controller, and the controller will
 create a replacement pod that is then scheduled by the pod's scheduler. The terminated
 pod and replacement pod are completely separate pods, and no pod migration is
 implied. However, describing the process as "moving" the pod is approximately accurate
 and easier to understand, so we will use this terminology in the document.
 We use the term "rescheduling" to describe any action the system takes to move an
 already-running pod. The decision may be made and executed by any component; we wil
 introduce the concept of a "rescheduler" component later, but it is not the only
 component that can do rescheduling.
 This proposal primarily focuses on the architecture and features/mechanisms used to
 achieve rescheduling, and only briefly discuss example policies. We expect that community
 experimentation will lead to a significantly better understanding of the range, potential,
 and limitations of rescheduling policies.
 ## Example use cases
 Example use cases for rescheduling are
 * moving a running pod onto a node that better satisfies its scheduling criteria
  * moving a pod onto an under-utilized node
  * moving a pod onto a node that meets more of the pod's affinity/anti-affinity preferences
 * moving a running pod off of a node in anticipation of a known or speculated future event
  * draining a node in preparation for maintenance, decomissioning, auto-scale-down, etc.
  * "preempting" a running pod to make room for a pending pod to schedule
  * proactively/speculatively make room for large and/or exclusive pods to facilitate
    fast scheduling in the future (often called "defragmentation")
  * (note that these last two cases are the only use cases where the first-order intent
    is to move a pod specifically for the benefit of another pod)
 * moving a running pod off of a node from which it is receiving poor service
  * anomalous crashlooping or other mysterious incompatiblity between the pod and the node
  * repeated out-of-resource killing (see #18724)
  * repeated attempts by the scheduler to schedule the pod onto some node, but it is
    rejected by Kubelet admission control due to incomplete scheduler knowledge
  * poor performance due to interference from other containers on the node (CPU hogs,
    cache thrashers, etc.) (note that in this case there is a choice of moving the victim
    or the aggressor)
 ## Some axes of the design space
 Among the key design decisions are
 * how does a pod specify its tolerance for these system-generated disruptions, and how
  does the system enforce such disruption limits
 * for each use case, where is the decision made about when and which pods to reschedule
  (controllers, schedulers, an entirely new component e.g. "rescheduler", etc.)
 * rescheduler design issues: how much does a rescheduler need to know about pods'
  schedulers' policies, how does the rescheduler specify its rescheduling
  requests/decisions (e.g. just as an eviction, an eviction with a hint about where to
  reschedule, or as an eviction paired with a specific binding), how does the system
  implement these requests, does the rescheduler take into account the second-order
  effects of decisions (e.g. whether an evicted pod will reschedule, will cause
  a preemption when it reschedules, etc.), does the rescheduler execute multi-step plans
  (e.g. evict two pods at the same time with the intent of moving one into the space
  vacated by the other, or even more complex plans)
 Additional musings on the rescheduling design space can be found [here](rescheduler.md).
 ## Design proposal
 The key mechanisms and components of the proposed design are priority, preemption,
 disruption budgets, the `/evict` subresource, and the rescheduler.
 ### Priority
 #### Motivation
 Just as it is useful to overcommit nodes to increase node-level utilization, it is useful
 to overcommit clusters to increase cluster-level utilization.  Scheduling priority (which
 we abbreviate as *priority*, in combination with disruption budgets (described in the
 next section), allows Kubernetes to safely overcommit clusters much as QoS levels allow
 it to safely overcommit nodes.
 Today, cluster sharing among users, workload types, etc. is regulated via the
 [quota](../admin/resourcequota/README.md) mechanism. When allocating quota, a cluster
 administrator has two choices: (1) the sum of the quotas is less than or equal to the
 capacity of the cluster, or (2) the sum of the quotas is greater than the capacity of the
 cluster (that is, the cluster is overcommitted).  (1) is likely to lead to cluster
 under-utilization, while (2) is unsafe in the sense that someone's pods may go pending
 indefinitely even though they are still within their quota. Priority makes cluster
 overcommitment (i.e. case (2)) safe by allowing users and/or administrators to identify
 which pods should be allowed to run, and which should go pending, when demand for cluster
 resources exceeds supply to due to cluster overcommitment.
 Priority is also useful in some special-case scenarios, such as ensuring that system
 DaemonSets can always schedule and reschedule onto every node where they want to run
 (assuming they are given the highest priority), e.g. see #21767.
 #### Specifying priorities
 We propose to add a required `Priority` field to `PodSpec`. Its value type is string, and
 the cluster administrator defines a total ordering on these strings (for example
 `Critical`, `Normal`, `Preemptible`). We choose string instead of integer so that it is
 easy for an administrator to add new priority levels in between existing levels, to
 encourage thinking about priority in terms of user intent and avoid magic numbers, and to
 make the internal implementation more flexible.
 When a scheduler is scheduling a new pod P and cannot find any node that meets all of P's
 scheduling predicates, it is allowed to evict ("preempt") one or more pods that are at
 the same or lower priority than P (subject to disruption budgets, see next section) from
 a node in order to make room for P, i.e. in order to make the scheduling predicates
 satisfied for P on that node.  (Note that when we add cluster-level resources (#19080),
 it might be necessary to preempt from multiple nodes, but that scenario is outside the
 scope of this document.)  The preempted pod(s) may or may not be able to reschedule. The
 net effect of this process is that when demand for cluster resources exceeds supply, the
 higher-priority pods will be able to run while the lower-priority pods will be forced to
 wait. The detailed mechanics of preemption are described in a later section.
 In addition to taking disruption budget into account, for equal-priority preemptions the
 scheduler will try to enforce fairness (across victim controllers, services, etc.)
 Priorities could be specified directly by users in the podTemplate, or assigned by an
 admission controller using
 properties of the pod. Either way, all schedulers must be configured to understand the
 same priorities (names and ordering). This could be done by making them constants in the
 API, or using ConfigMap to configure the schedulers with the information. The advantage of
 the former (at least making the names, if not the ordering, constants in the API) is that
 it allows the API server to do validation (e.g. to catch mis-spelling).
 In the future, which priorities are usable for a given namespace and pods with certain
 attributes may be configurable, similar to ResourceQuota, LimitRange, or security policy.
 Priority and resource QoS are indepedent.
 The priority we have described here might be used to prioritize the scheduling queue
 (i.e. the order in which a scheduler examines pods in its scheduling loop), but the two
 priority concepts do not have to be connected. It is somewhat logical to tie them
 together, since a higher priority genreally indicates that a pod is more urgent to get
 running. Also, scheduling low-priority pods before high-priority pods might lead to
 avoidable preemptions if the high-priority pods end up preempting the low-priority pods
 that were just scheduled.
 TODO: Priority and preemption are global or namespace-relative? See
 [this discussion thread](https://github.com/kubernetes/kubernetes/pull/22217#discussion_r55737389).
 #### Relationship of priority to quota
 Of course, if the decision of what priority to give a pod is solely up to the user, then
 users have no incentive to ever request any priority less than the maximum.  Thus
 priority is intimately related to quota, in the sense that resource quotas must be
 allocated on a per-priority-level basis (X amount of RAM at priority A, Y amount of RAM
 at priority B, etc.). The "guarantee" that highest-priority pods will always be able to
 schedule can only be achieved if the sum of the quotas at the top priority level is less
 than or equal to the cluster capacity. This is analogous to QoS, where safety can only be
 achieved if the sum of the limits of the top QoS level ("Guaranteed") is less than or
 equal to the node capacity. In terms of incentives, an organization could "charge"
 an amount proportional to the priority of the resources.
 The topic of how to allocate quota at different priority levels to achieve a desired
 balance between utilization and probability of schedulability is an extremely complex
 topic that is outside the scope of this document. For example, resource fragmentation and
 RequiredDuringScheduling node and pod affinity and anti-affinity means that even if the
 sum of the quotas at the top priority level is less than or equal to the total aggregate
 capacity of the cluster, some pods at the top priority level might still go pending. In
 general, priority provdes a *probabilistic* guarantees of pod schedulability in the face
 of overcommitment, by allowing prioritization of which pods should be allowed to run pods
 when demand for cluster resources exceeds supply.
 ### Disruption budget
 While priority can protect pods from one source of disruption (preemption by a
 lower-priority pod), *disruption budgets* limit disruptions from all Kubernetes-initiated
 causes, including preemption by an equal or higher-priority pod, or being evicted to
 achieve other rescheduling goals. In particular, each pod is optionally associated with a
 "disruption budget," a new API resource that limits Kubernetes-initiated terminations
 across a set of pods (e.g. the pods of a particular Service might all point to the same
 disruption budget object), regardless of cause. Initially we expect disruption budget
 (e.g. `DisruptionBudgetSpec`) to consist of
 * a rate limit on disruptions (preemption and other evictions) across the corresponding
  set of pods, e.g. no more than one disruption per hour across the pods of a particular Service
 * a minimum number of pods that must be up simultaneously (sometimes called "shard
  strength") (of course this can also be expressed as the inverse, i.e. the number of
  pods of the collection that can be down simultaneously)
 The second item merits a bit more explanation. One use case is to specify a quorum size,
 e.g. to ensure that at least 3 replicas of a quorum-based service with 5 replicas are up
 at the same time. In practice, a service should ideally create enough replicas to survive
 at least one planned and one unplanned outage. So in our quorum example, we would specify
 that at least 4 replicas must be up at the same time; this allows for one intentional
 disruption (bringing the number of live replicas down from 5 to 4 and consuming one unit
 of shard strength budget) and one unplanned disruption (bringing the number of live
 replicas down from 4 to 3) while still maintaining a quorum. Shard strength is also
 useful for simpler replicated services; for example, you might not want more than 10% of
 your front-ends to be down at the same time, so as to avoid overloading the remaining
 replicas.
 Initially, disruption budgets will be specified by the user. Thus as with priority,
 disruption budgets need to be tied into quota, to prevent users from saying none of their
 pods can ever be disrupted. The exact way of expressing and enforcing this quota is TBD,
 though a simple starting point would be to have an admission controller assign a default
 disruption budget based on priority level (more liberal with decreasing priority).
 We also likely need a quota that applies to Kubernetes *components*, to the limit the rate
 at which any one component is allowed to consume disruption budget.
 Of course there should also be a `DisruptionBudgetStatus` that indicates the current
 disruption rate that the collection of pods is experiencing, and the number of pods that
 are up.
 For the purposes of disruption budget, a pod is considered to be disrupted as soon as its
 graceful termination period starts.
 A pod that is not covered by a disruption budget but is managed by a controller,
 gets an implicit disruption budget of infinite (though the system should try to not
 unduly victimize such pods). How a pod that is not managed by a controller is
 handled is TBD.
 TBD: In addition to `PodSpec`, where do we store pointer to disruption budget
 (podTemplate in controller that managed the pod?)? Do we auto-generate a disruption
 budget (e.g. when instantiating a Service), or require the user to create it manually
 before they create a controller? Which objects should return the disruption budget object
 as part of the output on `kubectl get` other than (obviously) `kubectl get` for the
 disruption budget itself?
 TODO: Clean up distinction between "down due to voluntary action taken by Kubernetes"
 and "down due to unplanned outage" in spec and status.
 For now, there is nothing to prevent clients from circumventing the disruption budget
 protections. Of course, clients that do this are not being "good citizens." In the next
 section we describe a mechanism that at least makes it easy for well-behaved clients to
 obey the disruption budgets.
 See #12611 for additional discussion of disruption budgets.
 ### /evict subresource and PreferAvoidPods
 Although we could put the responsibility for checking and updating disruption budgets
 solely on the client, it is safer and more convenient if we implement that functionality
 in the API server. Thus we will introduce a new `/evict` subresource on pod. It is similar to
 today's "delete" on pod except
  * It will be rejected if the deletion would violate disruption budget. (See how
    Deployment handles failure of /rollback for ideas on how clients could handle failure
    of `/evict`.) There are two possible ways to implement this:
    * For the initial implementation, this will be accomplished by the API server just
    looking at the `DisruptionBudgetStatus` and seeing if the disruption would violate the
    `DisruptionBudgetSpec`. In this approach, we assume a disruption budget controller
    keeps the `DisruptionBudgetStatus` up-to-date by observing all pod deletions and
    creations in the cluster, so that an approved disruption is quickly reflected in the
    `DisruptionBudgetStatus`. Of course this approach does allow a race in which one or
    more additional disruptions could be approved before the first one is reflected in the
    `DisruptionBudgetStatus`.
    * Thus a subsequent implementation will have the API server explicitly debit the
    `DisruptionBudgetStatus` when it accepts an `/evict`. (There still needs to be a
    controller, to keep the shard strength status up-to-date when replacement pods are
    created after an eviction; the controller may also be necessary for the rate status
    depending on how rate is represented, e.g. adding tokens to a bucket at a fixed rate.)
    Once etcd support multi-object transactions (etcd v3), the debit and pod deletion will
    be placed in the same transaction.
    * Note: For the purposes of disruption budget, a pod is considered to be disrupted as soon as its
    graceful termination period starts (so when we say "delete" here we do not mean
    "deleted from etcd" but rather "graceful termination period has started").
  * It will allow clients to communicate additional parameters when they wish to delete a
  pod. (In the absence of the `/evict` subresource, we would have to create a pod-specific
  type analogous to `api.DeleteOptions`.)
 We will make `kubectl delete pod` use `/evict` by default, and require a command-line
 flag to delete the pod directly.
 We will add to `NodeStatus` a bounded-sized list of signatures of pods that should avoid
 that node (provisionally called `PreferAvoidPods`). One of the pieces of information
 specified in the `/evict` subresource is whether the eviction should add the evicted
 pod's signature to the corresponding node's `PreferAvoidPods`. Initially the pod
 signature will be a
 [controllerRef](https://github.com/kubernetes/kubernetes/issues/14961#issuecomment-183431648),
 i.e. a reference to the pod's controller. Controllers are responsible for garbage
 collecting, after some period of time, `PreferAvoidPods` entries that point to them, but the API
 server will also enforce a bounded size on the list. All schedulers will have a
 highest-weighted priority function that gives a node the worst priority if the pod it is
 scheduling appears in that node's `PreferAvoidPods` list. Thus appearing in
 `PreferAvoidPods` is similar to
 [RequiredDuringScheduling node anti-affinity](../../docs/user-guide/node-selection/README.md)
 but it takes precedence over all other priority criteria and is not explicitly listed in
 the `NodeAffinity` of the pod.
 `PreferAvoidPods` is useful for the "moving a running pod off of a node from which it is
 receiving poor service" use case, as it reduces the chance that the replacement pod will
 end up on the same node (keep in mind that most of those cases are situations that the
 scheduler does not have explicit priority functions for, for example it cannot know in
 advance that a pod will be starved). Also, though we do not intend to implement any such
 policies in the first version of the rescheduler, it is useful whenever the rescheduler evicts
 two pods A and B with the intention of moving A into the space vacated by B (it prevents
 B from rescheduling back into the space it vacated before A's scheduler has a chance to
 reschedule A there). Note that these two uses are subtly different; in the first
 case we want the avoidance to last a relatively long time, whereas in the second case we
 may only need it to last until A schedules.
 See #20699 for more discussion.
 ### Preemption mechanics
 **NOTE: We expect a fuller design doc to be written on preemption before it is implemented.
 However, a sketch of some ideas are presented here, since preemption is closely related to the
 concepts discussed in this doc.**
 Pod schedulers will decide and enact preemptions, subject to the priority and disruption
 budget rules described earlier. (Though note that we currently do not have any mechanism
 to prevent schedulers from bypassing either the priority or disruption budget rules.)
 The scheduler does not concern itself with whether the evicted pod(s) can reschedule. The
 eviction(s) use(s) the `/evict` subresource so that it is subject to the disruption
 budget(s) of the victim(s), but it does not request to add the victim pod(s) to the
 nodes' `PreferAvoidPods`.
 Evicting victim(s) and binding the pending pod that the evictions are intended to enable
 to schedule, are not transactional. We expect the scheduler to issue the operations in
 sequence, but it is still possible that another scheduler could schedule its pod in
 between the eviction(s) and the binding, or that the set of pods running on the node in
 question changed between the time the scheduler made its decision and the time it sent
 the operations to the API server thereby causing the eviction(s) to be not sufficient to get the
 pending pod to schedule. In general there are a number of race conditions that cannot be
 avoided without (1) making the evictions and binding be part of a single transaction, and
 (2) making the binding preconditioned on a version number that is associated with the
 node and is incremented on every binding. We may or may not implement those mechanisms in
 the future.
 Given a choice between a node where scheduling a pod requires preemption and one where it
 does not, all other things being equal, a scheduler should choose the one where
 preemption is not required. (TBD: Also, if the selected node does require preemption, the
 scheduler should preempt lower-priority pods before higher-priority pods (e.g. if the
 scheduler needs to free up 4 GB of RAM, and the node has two 2 GB low-priority pods and
 one 4 GB high-priority pod, all of which have sufficient disruption budget, it should
 preempt the two low-priority pods). This is debatable, since all have sufficient
 disruption budget. But still better to err on the side of giving better disruption SLO to
 higher-priority pods when possible?)
 Preemption victims must be given their termination grace period. One possible sequence
 of events is
 1. The API server binds the preemptor to the node (i.e. sets `nodeName` on the
 preempting pod) and sets `deletionTimestamp` on the victims
 2. Kubelet sees that `deletionTimestamp` has been set on the victims; they enter their
 graceful termination period
 3. Kubelet sees the preempting pod. It runs the admission checks on the new pod
 assuming all pods that are in their graceful termination period are gone and that
 all pods that are in the waiting state (see (4)) are running.
 4. If (3) fails, then the new pod is rejected. If (3) passes, then Kubelet holds the
 new pod in a waiting state, and does not run it until the pod passes passes the
 admission checks using the set of actually running pods.
 Note that there are a lot of details to be figured out here; above is just a very
 hand-wavy sketch of one general approach that might work.
 See #22212 for additional discussion.
 ### Node drain
 Node drain will be handled by one or more components not described in this document. They
 will respect disruption budgets. Initially, we will just make `kubectl drain`
 respect disruption budgets.  See #17393 for other discussion.
 ### Rescheduler
 All rescheduling other than preemption and node drain will be decided and enacted by a
 new component called the *rescheduler*. It runs continuously in the background, looking
 for opportunities to move pods to better locations. It acts when the degree of
 improvement meets some threshold and is allowed by the pod's disruption budget.  The
 action is eviction of a pod using the `/evict` subresource, with the pod's signature
 enqueued in the node's `PreferAvoidPods`. It does not force the pod to reschedule to any
 particular node. Thus it is really an "unscheduler"; only in combination with the evicted
 pod's scheduler, which schedules the replacement pod, do we get true "rescheduling."  See
 the "Example use cases" section earlier for some example use cases.
 The rescheduler is a best-effort service that makes no guarantees about how quickly (or
 whether) it will resolve a suboptimal pod placement.
 The first version of the rescheduler will not take into consideration where or whether an
 evicted pod will reschedule. The evicted pod may go pending, consuming one unit of the
 corresponding shard strength disruption budget by one indefinitely. By using the `/evict`
 subresource, the rescheduler ensures that an evicted pod has sufficient budget for the
 evicted pod to go and stay pending.  We expect future versions of the rescheduler may be
 linked with the "mandatory" predicate functions (currently, the ones that constitute the
 Kubelet admission criteria), and will only evict if the rescheduler determines that the
 pod can reschedule somewhere according to those criteria. (Note that this still does not
 guarantee that the pod actually will be able to reschedule, for at least two reasons: (1)
 the state of the cluster may change between the time the rescheduler evaluates it and
 when the evicted pod's scheduler tries to schedule the replacement pod, and (2) the
 evicted pod's scheduler may have additional predicate functions in addition to the
 mandatory ones).
 (Note: see [this comment](https://github.com/kubernetes/kubernetes/pull/22217#discussion_r54527968)).
 The first version of the rescheduler will only implement two objectives: moving a pod
 onto an under-utilized node, and moving a pod onto a node that meets more of the pod's
 affinity/anti-affinity preferences than wherever it is currently running. (We assume that
 nodes that are intentionally under-utilized, e.g. because they are being drained, are
 marked unschedulable, thus the first objective will not cause the rescheduler to "fight"
 a system that is draining nodes.)  We assume that all schedulers sufficiently weight the
 priority functions for affinity/anti-affinity and avoiding very packed nodes,
 otherwise evicted pods may not actually move onto a node that is better according to
 the criteria that caused it to be evicted. (But note that in all cases it will move to a
 node that is better according to the totality of its scheduler's priority functions,
 except in the case where the node where it was already running was the only node
 where it can run.) As a general rule, the rescheduler should only act when it sees
 particularly bad situations, since (1) an eviction for a marginal improvement is likely
 not worth the disruption--just because there is sufficient budget for an eviction doesn't
 mean an eviction is painless to the application, and (2) rescheduling the pod might not
 actually mitigate the identified problem if it is minor enough that other scheduling
 factors dominate the decision of where the replacement pod is scheduled.
 We assume schedulers' priority functions are at least vaguely aligned with the
 rescheduler's policies; otherwise the rescheduler will never accomplish anything useful,
 given that it relies on the schedulers to actually reschedule the evicted pods. (Even if
 the rescheduler acted as a scheduler, explicitly rebinding evicted pods, we'd still want
 this to be true, to prevent the schedulers and rescheduler from "fighting" one another.)
 The rescheduler will be configured using ConfigMap; the cluster administrator can enable
 or disable policies and can tune the rescheduler's aggressiveness (aggressive means it
 will use a relatively low threshold for triggering an eviction and may consume a lot of
 disruption budget, while non-aggressive means it will use a relatively high threshold for
 triggering an eviction and will try to leave plenty of buffer in disruption budgets). The
 first version of the rescheduler will not be extensible or pluggable, since we want to
 keep the code simple while we gain experience with the overall concept. In the future, we
 anticipate a version that will be extensible and pluggable.
 We might want some way to force the evicted pod to the front of the scheduler queue,
 independently of its priority.
 See #12140 for additional discussion.
 ### Final comments
 In general, the design space for this topic is huge. This document describes some of the
 design considerations and proposes one particular initial implementation. We expect
 certain aspects of the design to be "permanent" (e.g. the notion and use of priorities,
 preemption, disruption budgets, and the `/evict` subresource) while others may change over time
 (e.g. the partitioning of functionality between schedulers, controllers, rescheduler,
 horizontal pod autoscaler, and cluster autoscaler; the policies the rescheduler implements;
 the factors the rescheduler takes into account when making decisions (e.g. knowledge of
 schedulers' predicate and priority functions, second-order effects like whether and where
 evicted pod will be able to reschedule, etc.); the way the rescheduler enacts its
 decisions; and the complexity of the plans the rescheduler attempts to implement).
 ## Implementation plan
 The highest-priority feature to implement is the rescheduler with the two use cases
 highlighted earlier: moving a pod onto an under-utilized node, and moving a pod onto a
 node that meets more of the pod's affinity/anti-affinity preferences.  The former is
 useful to rebalance pods after cluster auto-scale-up, and the latter is useful for
 Ubernetes. This requires implementing disruption budgets and the `/evict` subresource,
 but not priority or preemption.
 Because the general topic of rescheduling is very speculative, we have intentionally
 proposed that the first version of the rescheduler be very simple -- only uses eviction
 (no attempt to guide replacement pod to any particular node), doesn't know schedulers'
 predicate or priority functions, doesn't try to move two pods at the same time, and only
 implements two use cases. As alluded to in the previous subsection, we expect the design
 and implementation to evolve over time, and we encourage members of the community to
 experiment with more sophisticated policies and to report their results from using them
 on real workloads.
 ## Alternative implementations
 TODO.
 ## Additional references
 TODO.
 TODO: Add reference to this doc from docs/proposals/rescheduler.md
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/rescheduling.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/resource-metrics-api.md
+++ b/docs/proposals/resource-metrics-api.md
@ -1,151 +1 @@
-# Resource Metrics API
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-metrics-api.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-metrics-api.md)
 This document describes API part of MVP version of Resource Metrics API effort in Kubernetes.
 Once the agreement will be made the document will be extended to also cover implementation details.
 The shape of the effort may be also a subject of changes once we will have more well-defined use cases.
 ## Goal
 The goal for the effort is to provide resource usage metrics for pods and nodes through the API server.
 This will be a stable, versioned API which core Kubernetes components can rely on.
 In the first version only the well-defined use cases will be handled,
 although the API should be easily extensible for potential future use cases.
 ## Main use cases
 This section describes well-defined use cases which should be handled in the first version.
 Use cases which are not listed below are out of the scope of MVP version of Resource Metrics API.
 #### Horizontal Pod Autoscaler
 HPA uses the latest value of cpu usage as an average aggregated across 1 minute
 (the window may change in the future). The data for a given set of pods
 (defined either by pod list or label selector) should be accesible in one request
 due to performance issues.
 #### Scheduler
 Scheduler in order to schedule best-effort pods requires node level resource usage metrics
 as an average aggregated across 1 minute (the window may change in the future).
 The metrics should be available for all resources supported in the scheduler.
 Currently the scheduler does not need this information, because it schedules best-effort pods
 without considering node usage. But having the metrics available in the API server is a blocker
 for adding the ability to take node usage into account when scheduling best-effort pods.
 ## Other considered use cases
 This section describes the other considered use cases and explains why they are out
 of the scope of the MVP version.
 #### Custom metrics in HPA
 HPA requires the latest value of application level metrics.
 The design of the pipeline for collecting application level metrics should
 be revisited and it's not clear whether application level metrics should be
 available in API server so the use case initially won't be supported.
 #### Cluster Federation
 The Cluster Federation control system might want to consider cluster-level usage (in addition to cluster-level request)
 of running pods when choosing where to schedule new pods. Although
 Cluster Federation is still in design,
 we expect the metrics API described here to be sufficient. Cluster-level usage can be
 obtained by summing over usage of all nodes in the cluster.
 #### kubectl top
 This feature is not yet specified/implemented although it seems reasonable to provide users information
 about resource usage on pod/node level.
 Since this feature has not been fully specified yet it will be not supported initially in the API although
 it will be probably possible to provide a reasonable implementation of the feature anyway.
 #### Kubernetes dashboard
 [Kubernetes dashboard](https://github.com/kubernetes/dashboard) in order to draw graphs requires resource usage
 in timeseries format from relatively long period of time. The aggregations should be also possible on various levels
 including replication controllers, deployments, services, etc.
 Since the use case is complicated it will not be supported initially in the API and they will query Heapster
 directly using some custom API there.
 ## Proposed API
 Initially the metrics API will be in a separate [API group](api-group.md) called ```metrics```.
 Later if we decided to have Node and Pod in different API groups also
 NodeMetrics and PodMetrics should be in different API groups.
 #### Schema
 The proposed schema is as follow. Each top-level object has `TypeMeta` and `ObjectMeta` fields
 to be compatible with Kubernetes API standards.
 ```go
 type NodeMetrics struct {
  unversioned.TypeMeta
  ObjectMeta
  // The following fields define time interval from which metrics were
  // collected in the following format [Timestamp-Window, Timestamp].
  Timestamp unversioned.Time
  Window    unversioned.Duration
  // The memory usage is the memory working set.
  Usage v1.ResourceList
 }
 type PodMetrics struct {
  unversioned.TypeMeta
  ObjectMeta
  // The following fields define time interval from which metrics were
  // collected in the following format [Timestamp-Window, Timestamp].
  Timestamp unversioned.Time
  Window    unversioned.Duration
  // Metrics for all containers are collected within the same time window.
  Containers []ContainerMetrics
 }
 type ContainerMetrics struct {
  // Container name corresponding to the one from v1.Pod.Spec.Containers.
  Name string
  // The memory usage is the memory working set.
  Usage v1.ResourceList
 }
 ```
 By default `Usage` is the mean from samples collected within the returned time window.
 The default time window is 1 minute.
 #### Endpoints
 All endpoints are GET endpoints, rooted at `/apis/metrics/v1alpha1/`.
 There won't be support for the other REST methods.
 The list of supported endpoints:
 - `/nodes` - all node metrics; type `[]NodeMetrics`
 - `/nodes/{node}` - metrics for a specified node; type `NodeMetrics`
 - `/namespaces/{namespace}/pods` - all pod metrics within namespace with support for `all-namespaces`; type `[]PodMetrics`
 - `/namespaces/{namespace}/pods/{pod}` - metrics for a specified pod; type `PodMetrics`
 The following query parameters are supported:
 - `labelSelector` - restrict the list of returned objects by labels (list endpoints only)
 In the future we may want to introduce the following params:
 `aggregator` (`max`, `min`, `95th`, etc.) and `window` (`1h`, `1d`, `1w`, etc.)
 which will allow to get the other aggregates over the custom time window.
 ## Further improvements
 Depending on the further requirements the following features may be added:
 - support for more metrics
 - support for application level metrics
 - watch for metrics
 - possibility to query for window sizes and aggregation functions (though single window size/aggregation function per request)
 - cluster level metrics
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-metrics-api.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/resource-quota-scoping.md
+++ b/docs/proposals/resource-quota-scoping.md
@ -1,333 +1 @@
-# Resource Quota - Scoping resources
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-quota-scoping.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-quota-scoping.md)
 ## Problem Description
 ### Ability to limit compute requests and limits
 The existing `ResourceQuota` API object constrains the total amount of compute
 resource requests.  This is useful when a cluster-admin is interested in
 controlling explicit resource guarantees such that there would be a relatively
 strong guarantee that pods created by users who stay within their quota will find
 enough free resources in the cluster to be able to schedule.  The end-user creating
 the pod is expected to have intimate knowledge on their minimum required resource
 as well as their potential limits.
 There are many environments where a cluster-admin does not extend this level
 of trust to their end-user because user's often request too much resource, and
 they have trouble reasoning about what they hope to have available for their
 application versus what their application actually needs.  In these environments,
 the cluster-admin will often just expose a single value (the limit) to the end-user.
 Internally, they may choose a variety of other strategies for setting the request.
 For example, some cluster operators are focused on satisfying a particular over-commit
 ratio and may choose to set the request as a factor of the limit to control for
 over-commit.  Other cluster operators may defer to a resource estimation tool that
 sets the request based on known historical trends.  In this environment, the
 cluster-admin is interested in exposing a quota to their end-users that maps
 to their desired limit instead of their request since that is the value the user
 manages.
 ### Ability to limit impact to node and promote fair-use
 The current `ResourceQuota` API object does not allow the ability
 to quota best-effort pods separately from pods with resource guarantees.
 For example, if a cluster-admin applies a quota that caps requested
 cpu at 10 cores and memory at 10Gi, all pods in the namespace must
 make an explicit resource request for cpu and memory to satisfy
 quota.  This prevents a namespace with a quota from supporting best-effort
 pods.
 In practice, the cluster-admin wants to control the impact of best-effort
 pods to the cluster, but not restrict the ability to run best-effort pods
 altogether.
 As a result, the cluster-admin requires the ability to control the
 max number of active best-effort pods.  In addition, the cluster-admin
 requires the ability to scope a quota that limits compute resources to
 exclude best-effort pods.
 ### Ability to quota long-running vs. bounded-duration compute resources
 The cluster-admin may want to quota end-users separately
 based on long-running vs. bounded-duration compute resources.
 For example, a cluster-admin may offer more compute resources
 for long running pods that are expected to have a more permanent residence
 on the node than bounded-duration pods.  Many batch style workloads
 tend to consume as much resource as they can until something else applies
 the brakes.  As a result, these workloads tend to operate at their limit,
 while many traditional web applications may often consume closer to their
 request if there is no active traffic.  An operator that wants to control
 density will offer lower quota limits for batch workloads than web applications.
 A classic example is a PaaS deployment where the cluster-admin may
 allow a separate budget for pods that run their web application vs. pods that
 build web applications.
 Another example is providing more quota to a database pod than a
 pod that performs a database migration.
 ## Use Cases
 * As a cluster-admin, I want the ability to quota
 * compute resource requests
 * compute resource limits
 * compute resources for terminating vs. non-terminating workloads
 * compute resources for best-effort vs. non-best-effort pods
 ## Proposed Change
 ### New quota tracked resources
 Support the following resources that can be tracked by quota.
 | Resource Name | Description |
 | ------------- | ----------- |
 | cpu | total cpu requests (backwards compatibility) |
 | memory | total memory requests (backwards compatibility) |
 | requests.cpu | total cpu requests |
 | requests.memory | total memory requests |
 | limits.cpu | total cpu limits |
 | limits.memory | total memory limits |
 ### Resource Quota Scopes
 Add the ability to associate a set of `scopes` to a quota.
 A quota will only measure usage for a `resource` if it matches
 the intersection of enumerated `scopes`.
 Adding a `scope` to a quota limits the number of resources
 it supports to those that pertain to the `scope`.  Specifying
 a resource on the quota object outside of the allowed set
 would result in a validation error.
 | Scope | Description |
 | ----- | ----------- |
 | Terminating | Match `kind=Pod` where `spec.activeDeadlineSeconds >= 0` |
 | NotTerminating | Match `kind=Pod` where `spec.activeDeadlineSeconds = nil` |
 | BestEffort | Match `kind=Pod` where `status.qualityOfService in (BestEffort)` |
 | NotBestEffort | Match `kind=Pod` where `status.qualityOfService not in (BestEffort)` |
 A `BestEffort` scope restricts a quota to tracking the following resources:
 * pod
 A `Terminating`, `NotTerminating`, `NotBestEffort` scope restricts a quota to
 tracking the following resources:
 * pod
 * memory, requests.memory, limits.memory
 * cpu, requests.cpu, limits.cpu
 ## Data Model Impact
 ```
 // The following identify resource constants for Kubernetes object types
 const (
 	// CPU request, in cores. (500m = .5 cores)
 	ResourceRequestsCPU ResourceName = "requests.cpu"
 	// Memory request, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
 	ResourceRequestsMemory ResourceName = "requests.memory"
 	// CPU limit, in cores. (500m = .5 cores)
 	ResourceLimitsCPU ResourceName = "limits.cpu"
 	// Memory limit, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
 	ResourceLimitsMemory ResourceName = "limits.memory"
 )
 // A scope is a filter that matches an object
 type ResourceQuotaScope string
 const (
  ResourceQuotaScopeTerminating ResourceQuotaScope = "Terminating"
  ResourceQuotaScopeNotTerminating ResourceQuotaScope = "NotTerminating"
  ResourceQuotaScopeBestEffort ResourceQuotaScope = "BestEffort"
  ResourceQuotaScopeNotBestEffort ResourceQuotaScope = "NotBestEffort"
 )
 // ResourceQuotaSpec defines the desired hard limits to enforce for Quota
 // The quota matches by default on all objects in its namespace.
 // The quota can optionally match objects that satisfy a set of scopes.
 type ResourceQuotaSpec struct {
  // Hard is the set of desired hard limits for each named resource
  Hard ResourceList `json:"hard,omitempty"`
  // A collection of filters that must match each object tracked by a quota.
  // If not specified, the quota matches all objects.
  Scopes []ResourceQuotaScope `json:"scopes,omitempty"`
 }
 ```
 ## Rest API Impact
 None.
 ## Security Impact
 None.
 ## End User Impact
 The `kubectl` commands that render quota should display its scopes.
 ## Performance Impact
 This feature will make having more quota objects in a namespace
 more common in certain clusters.  This impacts the number of quota
 objects that need to be incremented during creation of an object
 in admission control.  It impacts the number of quota objects
 that need to be updated during controller loops.
 ## Developer Impact
 None.
 ## Alternatives
 This proposal initially enumerated a solution that leveraged a
 `FieldSelector` on a `ResourceQuota` object.  A `FieldSelector`
 grouped an `APIVersion` and `Kind` with a selector over its
 fields that supported set-based requirements.  It would have allowed
 a quota to track objects based on cluster defined attributes.
 For example, a quota could do the following:
 * match `Kind=Pod` where `spec.restartPolicy in (Always)`
 * match `Kind=Pod` where `spec.restartPolicy in (Never, OnFailure)`
 * match `Kind=Pod` where `status.qualityOfService in (BestEffort)`
 * match `Kind=Service` where `spec.type in (LoadBalancer)`
 * see [#17484](https://github.com/kubernetes/kubernetes/issues/17484)
 Theoretically, it would enable support for fine-grained tracking
 on a variety of resource types.  While extremely flexible, there
 are cons to to this approach that make it premature to pursue
 at this time.
 * Generic field selectors are not yet settled art
 * see [#1362](https://github.com/kubernetes/kubernetes/issues/1362)
 * see [#19084](https://github.com/kubernetes/kubernetes/pull/19804)
 * Discovery API Limitations
 * Not possible to discover the set of field selectors supported by kind.
 * Not possible to discover if a field is readonly, readwrite, or immutable
 post-creation.
 The quota system would want to validate that a field selector is valid,
 and it would only want to select on those fields that are readonly/immutable
 post creation to make resource tracking work during update operations.
 The current proposal could grow to support a `FieldSelector` on a
 `ResourceQuotaSpec` and support a simple migration path to convert
 `scopes` to the matching `FieldSelector` once the project has identified
 how it wants to handle `fieldSelector` requirements longer term.
 This proposal previously discussed a solution that leveraged a
 `LabelSelector` as a mechanism to partition quota.  This is potentially
 interesting to explore in the future to allow `namespace-admins` to
 quota workloads based on local knowledge.  For example, a quota
 could match all kinds that match the selector
 `tier=cache, environment in (dev, qa)` separately from quota that
 matched `tier=cache, environment in (prod)`.  This is interesting to
 explore in the future, but labels are insufficient selection targets
 for `cluster-administrators` to control footprint.  In those instances,
 you need fields that are cluster controlled and not user-defined.
 ## Example
 ### Scenario 1
 The cluster-admin wants to restrict the following:
 * limit 2 best-effort pods
 * limit 2 terminating pods that can not use more than 1Gi of memory, and 2 cpu cores
 * limit 4 long-running pods that can not use more than 4Gi of memory, and 4 cpu cores
 * limit 6 pods in total, 10 replication controllers
 This would require the following quotas to be added to the namespace:
 ```
 $ cat quota-best-effort
 apiVersion: v1
 kind: ResourceQuota
 metadata:
  name: quota-best-effort
 spec:
  hard:
    pods: "2"
  scopes:
  - BestEffort
 $ cat quota-terminating
 apiVersion: v1
 kind: ResourceQuota
 metadata:
  name: quota-terminating
 spec:
  hard:
    pods: "2"
    memory.limit: 1Gi
    cpu.limit: 2
  scopes:
  - Terminating
  - NotBestEffort
 $ cat quota-longrunning
 apiVersion: v1
 kind: ResourceQuota
 metadata:
  name: quota-longrunning
 spec:
  hard:
    pods: "2"
    memory.limit: 4Gi
    cpu.limit: 4
  scopes:
  - NotTerminating
  - NotBestEffort 
 $ cat quota
 apiVersion: v1
 kind: ResourceQuota
 metadata:
  name: quota
 spec:
  hard:
    pods: "6"
    replicationcontrollers: "10"
 ```
 In the above scenario, every pod creation will result in its usage being
 tracked by `quota` since it has no additional scoping.  The pod will then
 be tracked by at 1 additional quota object based on the scope it
 matches.  In order for the pod creation to succeed, it must not violate
 the constraint of any matching quota.  So for example, a best-effort pod
 would only be created if there was available quota in `quota-best-effort`
 and `quota`.
 ## Implementation
 ### Assignee
@derekwaynecarr
 ### Work Items
 * Add support for requests and limits
 * Add support for scopes in quota-related admission and controller code
 ## Dependencies
 None.
 Longer term, we should evaluate what we want to do with `fieldSelector` as
 the requests around different quota semantics will continue to grow.
 ## Testing
 Appropriate unit and e2e testing will be authored.
 ## Documentation Impact
 Existing resource quota documentation and examples will be updated.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-quota-scoping.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/runtime-client-server.md
+++ b/docs/proposals/runtime-client-server.md
@ -1,206 +1 @@
-# Client/Server container runtime
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtime-client-server.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtime-client-server.md)
 ## Abstract
 A proposal of client/server implementation of kubelet container runtime interface.
 ## Motivation
 Currently, any container runtime has to be linked into the kubelet. This makes
 experimentation difficult, and prevents users from landing an alternate
 container runtime without landing code in core kubernetes.
 To facilitate experimentation and to enable user choice, this proposal adds a
 client/server implementation of the [new container runtime interface](https://github.com/kubernetes/kubernetes/pull/25899). The main goal
 of this proposal is:
 - make it easy to integrate new container runtimes
 - improve code maintainability
 ## Proposed design
 **Design of client/server container runtime**
 The main idea of client/server container runtime is to keep main control logic in kubelet while letting remote runtime only do dedicated actions. An alpha [container runtime API](../../pkg/kubelet/api/v1alpha1/runtime/api.proto) is introduced for integrating new container runtimes. The API is based on [protobuf](https://developers.google.com/protocol-buffers/) and [gRPC](http://www.grpc.io) for a number of benefits:
 - Perform faster than json
 - Get client bindings for free: gRPC supports ten languages
 - No encoding/decoding codes needed
 - Manage api interfaces easily: server and client interfaces are generated automatically
 A new container runtime manager `KubeletGenericRuntimeManager` will be introduced to kubelet, which will
 - conforms to kubelet's [Runtime](../../pkg/kubelet/container/runtime.go#L58) interface
 - manage Pods and Containers lifecycle according to kubelet policies
 - call remote runtime's API to perform specific pod, container or image operations
 A simple workflow of invoking remote runtime API on starting a Pod with two containers can be shown:
 ```
 Kubelet                  KubeletGenericRuntimeManager       RemoteRuntime
   +                              +                               +
   |                              |                               |
   +---------SyncPod------------->+                               |
   |                              |                               |
   |                              +---- Create PodSandbox ------->+
   |                              +<------------------------------+
   |                              |                               |
   |                              XXXXXXXXXXXX                    |
   |                              |          X                    |
   |                              |    NetworkPlugin.             |
   |                              |       SetupPod                |
   |                              |          X                    |
   |                              XXXXXXXXXXXX                    |
   |                              |                               |
   |                              +<------------------------------+
   |                              +----    Pull image1   -------->+
   |                              +<------------------------------+
   |                              +---- Create container1 ------->+
   |                              +<------------------------------+
   |                              +---- Start container1 -------->+
   |                              +<------------------------------+
   |                              |                               |
   |                              +<------------------------------+
   |                              +----    Pull image2   -------->+
   |                              +<------------------------------+
   |                              +---- Create container2 ------->+
   |                              +<------------------------------+
   |                              +---- Start container2 -------->+
   |                              +<------------------------------+
   |                              |                               |
   | <-------Success--------------+                               |
   |                              |                               |
   +                              +                               +
 ```
 And deleting a pod can be shown:
 ```
 Kubelet                  KubeletGenericRuntimeManager      RemoteRuntime
   +                              +                               +
   |                              |                               |
   +---------SyncPod------------->+                               |
   |                              |                               |
   |                              +----   Stop container1   ----->+
   |                              +<------------------------------+
   |                              +----  Delete container1  ----->+
   |                              +<------------------------------+
   |                              |                               |
   |                              +----   Stop container2   ------>+
   |                              +<------------------------------+
   |                              +----  Delete container2  ------>+
   |                              +<------------------------------+
   |                              |                               |
   |                              XXXXXXXXXXXX                    |
   |                              |          X                    |
   |                              |    NetworkPlugin.             |
   |                              |       TeardownPod             |
   |                              |          X                    |
   |                              XXXXXXXXXXXX                    |
   |                              |                               |
   |                              |                               |
   |                              +---- Delete PodSandbox  ------>+
   |                              +<------------------------------+
   |                              |                               |
   | <-------Success--------------+                               |
   |                              |                               |
   +                              +                               +
 ```
 **API definition**
 Since we are going to introduce more image formats and want to separate image management from containers and pods, this proposal introduces two services `RuntimeService` and `ImageService`. Both services are defined at [pkg/kubelet/api/v1alpha1/runtime/api.proto](../../pkg/kubelet/api/v1alpha1/runtime/api.proto):
 ```proto
 // Runtime service defines the public APIs for remote container runtimes
 service RuntimeService {
    // Version returns the runtime name, runtime version and runtime API version
    rpc Version(VersionRequest) returns (VersionResponse) {}
    // CreatePodSandbox creates a pod-level sandbox.
    // The definition of PodSandbox is at https://github.com/kubernetes/kubernetes/pull/25899
    rpc CreatePodSandbox(CreatePodSandboxRequest) returns (CreatePodSandboxResponse) {}
    // StopPodSandbox stops the sandbox. If there are any running containers in the
    // sandbox, they should be force terminated.
    rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
    // DeletePodSandbox deletes the sandbox. If there are any running containers in the
    // sandbox, they should be force deleted.
    rpc DeletePodSandbox(DeletePodSandboxRequest) returns (DeletePodSandboxResponse) {}
    // PodSandboxStatus returns the Status of the PodSandbox.
    rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
    // ListPodSandbox returns a list of SandBox.
    rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}
    // CreateContainer creates a new container in specified PodSandbox
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
    // StartContainer starts the container.
    rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
    // StopContainer stops a running container with a grace period (i.e., timeout).
    rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
    // RemoveContainer removes the container. If the container is running, the container
    // should be force removed.
    rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
    // ListContainers lists all containers by filters.
    rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
    // ContainerStatus returns status of the container.
    rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
    // Exec executes the command in the container.
    rpc Exec(stream ExecRequest) returns (stream ExecResponse) {}
 }
 // Image service defines the public APIs for managing images
 service ImageService {
    // ListImages lists existing images.
    rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
    // ImageStatus returns the status of the image.
    rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
    // PullImage pulls a image with authentication config.
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
    // RemoveImage removes the image.
    rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
 }
 ```
 Note that some types in [pkg/kubelet/api/v1alpha1/runtime/api.proto](../../pkg/kubelet/api/v1alpha1/runtime/api.proto) are already defined at [Container runtime interface/integration](https://github.com/kubernetes/kubernetes/pull/25899).
 We should decide how to integrate the types in [#25899](https://github.com/kubernetes/kubernetes/pull/25899) with gRPC services:
 * Auto-generate those types into protobuf by [go2idl](../../cmd/libs/go2idl/)
  - Pros:
    - trace type changes automatically, all type changes in Go will be automatically generated into proto files
  - Cons:
    - type change may break existing API implementations, e.g. new fields added automatically may not noticed by remote runtime
    - needs to convert Go types to gRPC generated types, and vise versa
    - needs processing attributes order carefully so as not to break generated protobufs (this could be done by using [protobuf tag](https://developers.google.com/protocol-buffers/docs/gotutorial))
    - go2idl doesn't support gRPC, [protoc-gen-gogo](https://github.com/gogo/protobuf) is still required for generating gRPC client
 * Embed those types as raw protobuf definitions and generate Go files by [protoc-gen-gogo](https://github.com/gogo/protobuf)
  - Pros:
    - decouple type definitions, all type changes in Go will be added to proto manually, so it's easier to track gRPC API version changes
    - Kubelet could reuse Go types generated by `protoc-gen-gogo` to avoid type conversions
  - Cons:
    - duplicate definition of same types
    - hard to track type changes automatically
    - need to manage proto files manually
 For better version controlling and fast iterations, this proposal embeds all those types in `api.proto` directly.
 ## Implementation
 Each new runtime should implement the [gRPC](http://www.grpc.io) server based on [pkg/kubelet/api/v1alpha1/runtime/api.proto](../../pkg/kubelet/api/v1alpha1/runtime/api.proto). For version controlling, `KubeletGenericRuntimeManager` will request `RemoteRuntime`'s `Version()` interface with the runtime api version. To keep backward compatibility, the API follows standard [protobuf guide](https://developers.google.com/protocol-buffers/docs/proto) to deprecate or add new interfaces.
 A new flag `--container-runtime-endpoint` (overrides `--container-runtime`) will be introduced to kubelet which identifies the unix socket file of the remote runtime service. And new flag `--image-service-endpoint` will be introduced to kubelet which identifies the unix socket file of the image service.
 To facilitate switching current container runtime (e.g. `docker` or `rkt`) to new runtime API, `KubeletGenericRuntimeManager` will provide a plugin mechanism allowing to specify local implementation or gRPC implementation.
 ## Community Discussion
 This proposal is first filed by [@brendandburns](https://github.com/brendandburns) at [kubernetes/13768](https://github.com/kubernetes/kubernetes/issues/13768):
 * [kubernetes/13768](https://github.com/kubernetes/kubernetes/issues/13768)
 * [kubernetes/13709](https://github.com/kubernetes/kubernetes/pull/13079)
 * [New container runtime interface](https://github.com/kubernetes/kubernetes/pull/25899)
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/runtime-client-server.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/runtime-pod-cache.md
+++ b/docs/proposals/runtime-pod-cache.md
@ -1,173 +1 @@
-# Kubelet: Runtime Pod Cache
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtime-pod-cache.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtime-pod-cache.md)
 This proposal builds on top of the Pod Lifecycle Event Generator (PLEG) proposed
 in [#12802](https://issues.k8s.io/12802). It assumes that Kubelet subscribes to
 the pod lifecycle event stream to eliminate periodic polling of pod
 states. Please see [#12802](https://issues.k8s.io/12802). for the motivation and
 design concept for PLEG.
 Runtime pod cache is an in-memory cache which stores the *status* of
 all pods, and is maintained by PLEG. It serves as a single source of
 truth for internal pod status, freeing Kubelet from querying the
 container runtime.
 ## Motivation
 With PLEG, Kubelet no longer needs to perform comprehensive state
 checking for all pods periodically. It only instructs a pod worker to
 start syncing when there is a change of its pod status. Nevertheless,
 during each sync, a pod worker still needs to construct the pod status
 by examining all containers (whether dead or alive) in the pod, due to
 the lack of the caching of previous states. With the integration of
 pod cache, we can further improve Kubelet's CPU usage by
 1. Lowering the number of concurrent requests to the container
    runtime since pod workers no longer have to query the runtime
    individually.
 2. Lowering the total number of inspect requests because there is no
    need to inspect containers with no state changes.
 ***Don't we already have a [container runtime cache]
 (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime_cache.go)?***
 The runtime cache is an optimization that reduces the number of `GetPods()`
 calls from the workers. However,
 * The cache does not store all information necessary for a worker to
   complete a sync (e.g., `docker inspect`); workers still need to inspect
   containers individually to generate `api.PodStatus`.
 * Workers sometimes need to bypass the cache in order to retrieve the
   latest pod state.
 This proposal generalizes the cache and instructs PLEG to populate the cache, so
 that the content is always up-to-date.
 **Why can't each worker cache its own pod status?**
 The short answer is yes, they can. The longer answer is that localized
 caching limits the use of the cache content -- other components cannot
 access it. This often leads to caching at multiple places and/or passing
 objects around, complicating the control flow.
 ## Runtime Pod Cache
 ![pod cache](pod-cache.png)
 Pod cache stores the `PodStatus` for all pods on the node. `PodStatus` encompasses
 all the information required from the container runtime to generate
 `api.PodStatus` for a pod.
 ```go
 // PodStatus represents the status of the pod and its containers.
 // api.PodStatus can be derived from examining PodStatus and api.Pod.
 type PodStatus struct {
    ID types.UID
    Name string
    Namespace string
    IP string
    ContainerStatuses []*ContainerStatus
 }
 // ContainerStatus represents the status of a container.
 type ContainerStatus struct {
    ID ContainerID
    Name string
    State ContainerState
    CreatedAt time.Time
    StartedAt time.Time
    FinishedAt time.Time
    ExitCode int
    Image string
    ImageID string
    Hash uint64
    RestartCount int
    Reason string
    Message string
 }
 ```
 `PodStatus` is defined in the container runtime interface, hence is
 runtime-agnostic.
 PLEG is responsible for updating the entries pod cache, hence always keeping
 the cache up-to-date.
 1. Detect change of container state
 2. Inspect the pod for details
 3. Update the pod cache with the new PodStatus
  - If there is no real change of the pod entry, do nothing
  - Otherwise, generate and send out the corresponding pod lifecycle event
 Note that in (3), PLEG can check if there is any disparity between the old
 and the new pod entry to filter out duplicated events if needed.
 ### Evict cache entries
 Note that the cache represents all the pods/containers known by the container
 runtime. A cache entry should only be evicted if the pod is no longer visible
 by the container runtime. PLEG is responsible for deleting entries in the
 cache.
 ### Generate `api.PodStatus`
 Because pod cache stores the up-to-date `PodStatus` of the pods, Kubelet can
 generate the `api.PodStatus` by interpreting the cache entry at any
 time. To avoid sending intermediate status (e.g., while a pod worker
 is restarting a container), we will instruct the pod worker to generate a new
 status at the beginning of each sync.
 ### Cache contention
 Cache contention should not be a problem when the number of pods is
 small. When Kubelet scales, we can always shard the pods by ID to
 reduce contention.
 ### Disk management
 The pod cache is not capable to fulfill the needs of container/image garbage
 collectors as they may demand more than pod-level information. These components
 will still need to query the container runtime directly at times. We may
 consider extending the cache for these use cases, but they are beyond the scope
 of this proposal.
 ## Impact on Pod Worker Control Flow
 A pod worker may perform various operations (e.g., start/kill a container)
 during a sync. They will expect to see the results of such operations reflected
 in the cache in the next sync. Alternately, they can bypass the cache and
 query the container runtime directly to get the latest status. However, this
 is not desirable since the cache is introduced exactly to eliminate unnecessary,
 concurrent queries. Therefore, a pod worker should be blocked until all expected
 results have been updated to the cache by PLEG.
 Depending on the type of PLEG (see [#12802](https://issues.k8s.io/12802)) in
 use, the methods to check whether a requirement is met can differ. For a
 PLEG that solely relies on relisting, a pod worker can simply wait until the
 relist timestamp is newer than the end of the worker's last sync. On the other
 hand, if pod worker knows what events to expect, they can also block until the
 events are observed.
 It should be noted that `api.PodStatus` will only be generated by the pod
 worker *after* the cache has been updated. This means that the perceived
 responsiveness of Kubelet (from querying the API server) will be affected by
 how soon the cache can be populated. For the pure-relisting PLEG, the relist
 period can become the bottleneck. On the other hand, A PLEG which watches the
 upstream event stream (and knows how what events to expect) is not restricted
 by such periods and should improve Kubelet's perceived responsiveness.
 ## TODOs for v1.2
 - Redefine container runtime types ([#12619](https://issues.k8s.io/12619)):
   and introduce `PodStatus`. Refactor dockertools and rkt to use the new type.
 - Add cache and instruct PLEG to populate it.
 - Refactor Kubelet to use the cache.
 - Deprecate the old runtime cache.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/runtime-pod-cache.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/runtimeconfig.md
+++ b/docs/proposals/runtimeconfig.md
@ -1,69 +1 @@
-# Overview
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtimeconfig.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtimeconfig.md)
 Proposes adding a `--feature-config` to core kube system components:
 apiserver , scheduler, controller-manager, kube-proxy, and selected addons.
 This flag will be used to enable/disable alpha features on a per-component basis.
 ## Motivation
 Motivation is enabling/disabling features that are not tied to
 an API group. API groups can be selectively enabled/disabled in the
 apiserver via existing `--runtime-config` flag on apiserver, but there is
 currently no mechanism to toggle alpha features that are controlled by
 e.g. annotations. This means the burden of controlling whether such
 features are enabled in a particular cluster is on feature implementors;
 they must either define some ad hoc mechanism for toggling (e.g. flag
 on component binary) or else toggle the feature on/off at compile time.
 By adding a`--feature-config` to all kube-system components, alpha features
 can be toggled on a per-component basis by passing `enableAlphaFeature=true|false`
 to `--feature-config` for each component that the feature touches.
 ## Design
 The following components will all get a `--feature-config` flag,
 which loads a `config.ConfigurationMap`:
 - kube-apiserver
 - kube-scheduler
 - kube-controller-manager
 - kube-proxy
 - kube-dns
 (Note kubelet is omitted, it's dynamic config story is being addressed
 by #29459). Alpha features that are not accessed via an alpha API
 group should define an `enableFeatureName` flag and use it to toggle
 activation of the feature in each system component that the feature
 uses.
 ## Suggested conventions
 This proposal only covers adding a mechanism to toggle features in
 system components. Implementation details will still depend on the alpha
 feature's owner(s). The following are suggested conventions:
 - Naming for feature config entries should follow the pattern
  "enable<FeatureName>=true".
 - Features that touch multiple components should reserve the same key
  in each component to toggle on/off.
 - Alpha features should be disabled by default. Beta features may
  be enabled by default. Refer to docs/devel/api_changes.md#alpha-beta-and-stable-versions
  for more detailed guidance on alpha vs. beta.
 ## Upgrade support
 As the primary motivation for cluster config is toggling alpha
 features, upgrade support is not in scope. Enabling or disabling
 a feature is necessarily a breaking change, so config should
 not be altered in a running cluster.
 ## Future work
 1. The eventual plan is for component config to be managed by versioned
 APIs and not flags (#12245). When that is added, toggling of features
 could be handled by versioned component config and the component flags
 deprecated.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/runtimeconfig.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/scalability-testing.md
+++ b/docs/proposals/scalability-testing.md
@ -1,72 +1 @@
-
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scalability-testing.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scalability-testing.md)
 ## Background
 We have a goal to be able to scale to 1000-node clusters by end of 2015.
 As a result, we need to be able to run some kind of regression tests and deliver
 a mechanism so that developers can test their changes with respect to performance.
 Ideally, we would like to run performance tests also on PRs - although it might
 be impossible to run them on every single PR, we may introduce a possibility for
 a reviewer to trigger them if the change has non obvious impact on the performance
 (something like "k8s-bot run scalability tests please" should be feasible).
 However, running performance tests on 1000-node clusters (or even bigger in the
 future is) is a non-starter. Thus, we need some more sophisticated infrastructure
 to simulate big clusters on relatively small number of machines and/or cores.
 This document describes two approaches to tackling this problem.
 Once we have a better understanding of their consequences, we may want to
 decide to drop one of them, but we are not yet in that position.
 ## Proposal 1 - Kubmark
 In this proposal we are focusing on scalability testing of master components.
 We do NOT focus on node-scalability - this issue should be handled separately.
 Since we do not focus on the node performance, we don't need real Kubelet nor
 KubeProxy - in fact we don't even need to start real containers.
 All we actually need is to have some Kubelet-like and KubeProxy-like components
 that will be simulating the load on apiserver that their real equivalents are
 generating (e.g. sending NodeStatus updated, watching for pods, watching for
 endpoints (KubeProxy), etc.).
 What needs to be done:
 1. Determine what requests both KubeProxy and Kubelet are sending to apiserver.
 2. Create a KubeletSim that is generating the same load on apiserver that the
   real Kubelet, but is not starting any containers. In the initial version we
   can assume that pods never die, so it is enough to just react on the state
   changes read from apiserver.
 	 TBD: Maybe we can reuse a real Kubelet for it by just injecting some "fake"
   interfaces to it?
 3. Similarly create a KubeProxySim that is generating the same load on apiserver
   as a real KubeProxy. Again, since we are not planning to talk to those
   containers, it basically doesn't need to do anything apart from that.
 	 TBD: Maybe we can reuse a real KubeProxy for it by just injecting some "fake"
   interfaces to it?
 4. Refactor kube-up/kube-down scripts (or create new ones) to allow starting
   a cluster with KubeletSim and KubeProxySim instead of real ones and put
   a bunch of them on a single machine.
 5. Create a load generator for it (probably initially it would be enough to
   reuse tests that we use in gce-scalability suite).
 ## Proposal 2 - Oversubscribing
 The other method we are proposing is to oversubscribe the resource,
 or in essence enable a single node to look like many separate nodes even though
 they reside on a single host. This is a well established pattern in many different
 cluster managers (for more details see
 http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.prd/index.html ).
 There are a couple of different ways to accomplish this, but the most viable method
 is to run privileged kubelet pods under a hosts kubelet process. These pods then
 register back with the master via the introspective service using modified names
 as not to collide.
 Complications may currently exist around container tracking and ownership in docker.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/scalability-testing.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/scheduledjob.md
+++ b/docs/proposals/scheduledjob.md
@ -1,335 +1 @@
-# ScheduledJob Controller
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduledjob.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduledjob.md)
 ## Abstract
 A proposal for implementing a new controller - ScheduledJob controller - which
 will be responsible for managing time based jobs, namely:
 * once at a specified point in time,
 * repeatedly at a specified point in time.
 There is already a discussion regarding this subject:
 * Distributed CRON jobs [#2156](https://issues.k8s.io/2156)
 There are also similar solutions available, already:
 * [Mesos Chronos](https://github.com/mesos/chronos)
 * [Quartz](http://quartz-scheduler.org/)
 ## Use Cases
 1. Be able to schedule a job execution at a given point in time.
 1. Be able to create a periodic job, e.g. database backup, sending emails.
 ## Motivation
 ScheduledJobs are needed for performing all time-related actions, namely backups,
 report generation and the like.  Each of these tasks should be allowed to run
 repeatedly (once a day/month, etc.) or once at a given point in time.
 ## Design Overview
 Users create a ScheduledJob object.  One ScheduledJob object
 is like one line of a crontab file.  It has a schedule of when to run,
 in [Cron](https://en.wikipedia.org/wiki/Cron) format.
 The ScheduledJob controller creates a Job object [Job](job.md)
 about once per execution time of the scheduled (e.g. once per
 day for a daily schedule.)  We say "about" because there are certain
 circumstances where two jobs might be created, or no job might be
 created.  We attempt to make these rare, but do not completely prevent
 them.  Therefore, Jobs should be idempotent.
 The Job object is responsible for any retrying of Pods, and any parallelism
 among pods it creates, and determining the success or failure of the set of
 pods.  The ScheduledJob does not examine pods at all.
 ### ScheduledJob resource
 The new `ScheduledJob` object will have the following contents:
 ```go
 // ScheduledJob represents the configuration of a single scheduled job.
 type ScheduledJob struct {
    TypeMeta
    ObjectMeta
    // Spec is a structure defining the expected behavior of a job, including the schedule.
    Spec ScheduledJobSpec
    // Status is a structure describing current status of a job.
    Status ScheduledJobStatus
 }
 // ScheduledJobList is a collection of scheduled jobs.
 type ScheduledJobList struct {
    TypeMeta
    ListMeta
    Items []ScheduledJob
 }
 ```
 The `ScheduledJobSpec` structure is defined to contain all the information how the actual
 job execution will look like, including the `JobSpec` from [Job API](job.md)
 and the schedule in [Cron](https://en.wikipedia.org/wiki/Cron) format.  This implies
 that each ScheduledJob execution will be created from the JobSpec actual at a point
 in time when the execution will be started.  This also implies that any changes
 to ScheduledJobSpec will be applied upon subsequent execution of a job.
 ```go
 // ScheduledJobSpec describes how the job execution will look like and when it will actually run.
 type ScheduledJobSpec struct {
    // Schedule contains the schedule in Cron format, see https://en.wikipedia.org/wiki/Cron.
    Schedule string
    // Optional deadline in seconds for starting the job if it misses scheduled
    // time for any reason.  Missed jobs executions will be counted as failed ones.
    StartingDeadlineSeconds *int64
    // ConcurrencyPolicy specifies how to treat concurrent executions of a Job.
    ConcurrencyPolicy ConcurrencyPolicy
    // Suspend flag tells the controller to suspend subsequent executions, it does
    // not apply to already started executions.  Defaults to false.
    Suspend bool
    // JobTemplate is the object that describes the job that will be created when
    // executing a ScheduledJob.
    JobTemplate *JobTemplateSpec
 }
 // JobTemplateSpec describes of the Job that will be created when executing
 // a ScheduledJob, including its standard metadata.
 type JobTemplateSpec struct {
    ObjectMeta
    // Specification of the desired behavior of the job.
    Spec JobSpec
 }
 // ConcurrencyPolicy describes how the job will be handled.
 // Only one of the following concurrent policies may be specified.
 // If none of the following policies is specified, the default one
 // is AllowConcurrent.
 type ConcurrencyPolicy string
 const (
    // AllowConcurrent allows ScheduledJobs to run concurrently.
    AllowConcurrent ConcurrencyPolicy = "Allow"
    // ForbidConcurrent forbids concurrent runs, skipping next run if previous
    // hasn't finished yet.
    ForbidConcurrent ConcurrencyPolicy = "Forbid"
    // ReplaceConcurrent cancels currently running job and replaces it with a new one.
    ReplaceConcurrent ConcurrencyPolicy = "Replace"
 )
 ```
 `ScheduledJobStatus` structure is defined to contain information about scheduled
 job executions.  The structure holds a list of currently running job instances
 and additional information about overall successful and unsuccessful job executions.
 ```go
 // ScheduledJobStatus represents the current state of a Job.
 type ScheduledJobStatus struct {
    // Active holds pointers to currently running jobs.
    Active []ObjectReference
    // Successful tracks the overall amount of successful completions of this job.
    Successful int64
    // Failed tracks the overall amount of failures of this job.
    Failed int64
    // LastScheduleTime keeps information of when was the last time the job was successfully scheduled.
    LastScheduleTime Time
 }
 ```
 Users must use a generated selector for the job.
 ## Modifications to Job resource
 TODO for beta: forbid manual selector since that could cause confusing between
 subsequent jobs.
 ### Running ScheduledJobs using kubectl
 A user should be able to easily start a Scheduled Job using `kubectl` (similarly
 to running regular jobs). For example to run a job with a specified schedule,
 a user should be able to type something simple like:
 ```
 kubectl run pi --image=perl --restart=OnFailure --runAt="0 14 21 7 *" -- perl -Mbignum=bpi -wle 'print bpi(2000)'
 ```
 In the above example:
 * `--restart=OnFailure` implies creating a job instead of replicationController.
 * `--runAt="0 14 21 7 *"` implies the schedule with which the job should be run, here
  July 21, 2pm.  This value will be validated according to the same rules which
  apply to `.spec.schedule`.
 ## Fields Added to Job Template
 When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it
 adds the following fields to the Job:
 - a name, based on the ScheduledJob's name, but with a suffix to distinguish
  multiple executions, which may overlap.
 - the standard created-by annotation on the Job, pointing to the SJ that created it
  The standard key is `kubernetes.io/created-by`.  The value is a serialized JSON object, like
  `{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",`
  `"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...`
  This serialization contains the UID of the parent.  This is used to match the Job to the SJ that created
  it.
 ## Updates to ScheduledJobs
 If the schedule is updated on a ScheduledJob, it will:
 - continue to use the Status.Active list of jobs to detect conflicts.
 - try to fulfill all recently-passed times for the new schedule, by starting
  new jobs.  But it will not try to fulfill times prior to the
  Status.LastScheduledTime.
  - Example:   If you have a schedule to run every 30 minutes, and change that to hourly, then the previously started
    top-of-the-hour run, in Status.Active, will be seen and no new job started.
  - Example:   If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour,
    one run will be started immediately for the starting time that has just passed.
 If the job template of a ScheduledJob is updated, then future executions use the new template
 but old ones still satisfy the schedule and are not re-run just because the template changed.
 If you delete and replace a ScheduledJob with one of the same name, it will:
 - not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
  ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
 - If there is an existing Job with the same time-based hash in its name (see below), then
  new instances of that job will not be able to be created.  So, delete it if you want to re-run.
 with the same name as conflicts.
 - not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object.
 - not consider executions from the previous UID when making decisions about what executions to
 start, or status, etc.
 - lose the history of the old SJ.
 To preserve status, you can suspend the old one, and make one with a new name, or make a note of the old status.
 ## Fault-Tolerance
 ### Starting Jobs in the face of controller failures
 If the process with the scheduledJob controller in it fails,
 and takes a while to restart, the scheduledJob controller
 may miss the time window and it is too late to start a job.
 With a single scheduledJob controller process, we cannot give
 very strong assurances about not missing starting jobs.
 With a suggested HA configuration, there are multiple controller
 processes, and they use master election to determine which one
 is active at any time.
 If the Job's StartingDeadlineSeconds is long enough, and the
 lease for the master lock is short enough, and other controller
 processes are running, then a Job will be started.
 TODO: consider hard-coding the minimum StartingDeadlineSeconds
 at say 1 minute.  Then we can offer a clearer guarantee,
 assuming we know what the setting of the lock lease duration is.
 ### Ensuring jobs are run at most once
 There are three problems here:
 - ensure at most one Job created per "start time" of a schedule.
 - ensure that at most one Pod is created per Job
 - ensure at most one container start occurs per Pod
 #### Ensuring one Job
 Multiple jobs might be created in the following sequence:
 1. scheduled job controller sends request to start Job J1 to fulfill start time T.
 1. the create request is accepted by the apiserver and enqueued but not yet written to etcd.
 1. scheduled job controller crashes
 1. new scheduled job controller starts, and lists the existing jobs, and does not see one created.
 1. it creates a new one.
 1. the first one eventually gets written to etcd.
 1. there are now two jobs for the same start time.
 We can solve this in several ways:
 1. with three-phase protocol, e.g.:
  1. controller creates a "suspended" job.
  1. controller writes writes an annotation in the SJ saying that it created a job for this time.
  1. controller unsuspends that job.
 1. by picking a deterministic name, so that at most one object create can succeed.
 #### Ensuring one Pod
 Job object does not currently have a way to ask for this.
 Even if it did, controller is not written to support it.
 Same problem as above.
 #### Ensuring one container invocation per Pod
 Kubelet is not written to ensure at-most-one-container-start per pod.
 #### Decision
 This is too hard to do for the alpha version.  We will await user
 feedback to see if the "at most once" property is needed in the beta version.
 This is awkward but possible for a containerized application ensure on it own, as it needs
 to know what ScheduledJob name and Start Time it is from, and then record the attempt
 in a shared storage system.   We should ensure it could extract this data from its annotations
 using the downward API.
 ## Name of Jobs
 A ScheduledJob creates one Job at each time when a Job should run.
 Since there may be concurrent jobs, and since we might want to keep failed
 non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob
 needs a distinct name.
 To make the Jobs from the same ScheduledJob distinct, we could use a random string,
 in the way that pods have a `generateName`.  For example, a scheduledJob named `nightly-earnings-report`
 in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create
 a job called `nightly-earnings-report-6k7ts`.  This is consistent with pods, but
 does not give the user much information.
 Alternatively, we can use time as a uniquifier.  For example, the same scheduledJob could
 create a job called `nightly-earnings-report-2016-May-19`.
 However, for Jobs that run more than once per day, we would need to represent
 time as well as date.  Standard date formats (e.g. RFC 3339) use colons for time.
 Kubernetes names cannot include time.  Using a non-standard date format without colons
 will annoy some users.
 Also, date strings are much longer than random suffixes, which means that
 the pods will also have long names, and that we are more likely to exceed the
 253 character name limit when combining the scheduled-job name,
 the time suffix, and pod random suffix.
 One option would be to compute a hash of the nominal start time of the job,
 and use that as a suffix.  This would not provide the user with an indication
 of the start time, but it would prevent creation of the same execution
 by two instances (replicated or restarting) of the controller process.
 We chose to use the hashed-date suffix approach.
 ## Future evolution
 Below are the possible future extensions to the Job controller:
 * Be able to specify workflow template in `.spec` field. This relates to the work
  happening in [#18827](https://issues.k8s.io/18827).
 * Be able to specify more general template in `.spec` field, to create arbitrary
  types of resources. This relates to the work happening in [#18215](https://issues.k8s.io/18215).
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/scheduledjob.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/secret-configmap-downwarapi-file-mode.md
+++ b/docs/proposals/secret-configmap-downwarapi-file-mode.md
@ -1,186 +1 @@
-# Secrets, configmaps and downwardAPI file mode bits
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/secret-configmap-downwarapi-file-mode.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/secret-configmap-downwarapi-file-mode.md)
 Author: Rodrigo Campos (@rata), Tim Hockin (@thockin)
 Date: July 2016
 Status: Design in progress
 # Goal
 Allow users to specify permission mode bits for a secret/configmap/downwardAPI
 file mounted as a volume. For example, if a secret has several keys, a user
 should be able to specify the permission mode bits for any file, and they may
 all have different modes.
 Let me say that with "permission" I only refer to the file mode here and I may
 use them interchangeably. This is not about the file owners, although let me
 know if you prefer to discuss that here too.
 # Motivation
 There is currently no way to set permissions on secret files mounted as volumes.
 This can be a problem for applications that enforce files to have permissions
 only for the owner (like fetchmail, ssh, pgpass file in postgres[1], etc.) and
 it's just not possible to run them without changing the file mode. Also,
 in-house applications may have this restriction too.
 It doesn't seem totally wrong if someone wants to make a secret, that is
 sensitive information, not world-readable (or group, too) as it is by default.
 Although it's already in a container that is (hopefully) running only one
 process and it might not be so bad. But people running more than one process in
 a container asked for this too[2].
 For example, my use case is that we are migrating to kubernetes, the migration
 is in progress (and will take a while) and we have migrated our deployment web
 interface to kubernetes. But this interface connects to the servers via ssh, so
 it needs the ssh keys, and ssh will only work if the ssh key file mode is the
 one it expects.
 This was asked on the mailing list here[2] and here[3], too.
 [1]: https://www.postgresql.org/docs/9.1/static/libpq-pgpass.html
 [2]: https://groups.google.com/forum/#!topic/kubernetes-dev/eTnfMJSqmaM
 [3]: https://groups.google.com/forum/#!topic/google-containers/EcaOPq4M758
 # Alternatives considered
 Several alternatives have been considered:
 * Add a mode to the API definition when using secrets: this is backward
   compatible as described in (docs/devel/api_changes.md) IIUC and seems like the
   way to go. Also @thockin said in the ML that he would consider such an
   approach. But it might be worth to consider if we want to do the same for
   configmaps or owners, but there is no need to do it now either.
 * Change the default file mode for secrets: I think this is unacceptable as it
   is stated in the api_changes doc. And besides it doesn't feel correct IMHO, it
   is technically one option. The argument for this might be that world and group
   readable for a secret is not a nice default, we already take care of not
   writing it to disk, etc. but the file is created world-readable anyways. Such a
   default change has been done recently: the default was 0444 in kubernetes <= 1.2
   and is now 0644 in kubernetes >= 1.3 (and the file is not a regular file,
   it's a symlink now). This change was done here to minimize differences between
   configmaps and secrets: https://github.com/kubernetes/kubernetes/pull/25285. But
   doing it again, and changing to something more restrictive (now is 0644 and it
   should be 0400 to work with ssh and most apps) seems too risky, it's even more
   restrictive than in k8s 1.2. Specially if there is no way to revert to the old
   permissions and some use case is broken by this. And if we are adding a way to
   change it, like in the option above, there is no need to rush changing the
   default. So I would discard this.
 * We don't want to people be able to change this, at least for now, and the
   ones who do, suggest that do it as a "postStart" command. This is acceptable
   if we don't want to change kubernetes core for some reason, although there
   seem to be valid use cases. But if the user want's to use the "postStart" for
   something else, then it is more disturbing to do both things (have a script
   in the docker image that deals with this, but is not probably concern of the
   project so it's not nice, or specify several commands by using "sh").
 # Proposed implementation
 The proposed implementation goes with the first alternative: adding a `mode`
 to the API.
 There will be a `defaultMode`, type `int`, in: `type SecretVolumeSource`, `type
 ConfigMapVolumeSource` and `type DownwardAPIVolumeSource`. And a `mode`, type
 `int` too, in `type KeyToPath` and `DownwardAPIVolumeFile`.
 The mask provided in any of these fields will be ANDed with 0777 to disallow
 setting sticky and setuid bits. It's not clear that use case is needed nor
 really understood. And directories within the volume will be created as before
 and are not affected by this setting.
 In other words, the fields will look like this:
 ```
 type SecretVolumeSource struct {
        // Name of the secret in the pod's namespace to use.
        SecretName string `json:"secretName,omitempty"`
        // If unspecified, each key-value pair in the Data field of the referenced
        // Secret will be projected into the volume as a file whose name is the
        // key and content is the value. If specified, the listed keys will be
        // projected into the specified paths, and unlisted keys will not be
        // present. If a key is specified which is not present in the Secret,
        // the volume setup will error. Paths must be relative and may not contain
        // the '..' path or start with '..'.
        Items       []KeyToPath `json:"items,omitempty"`
        // Mode bits to use on created files by default. The used mode bits will
        // be the provided AND 0777.
        // Directories within the path are not affected by this setting
        DefaultMode int32         `json:"defaultMode,omitempty"`
 }
 type ConfigMapVolumeSource struct {
        LocalObjectReference `json:",inline"`
        // If unspecified, each key-value pair in the Data field of the referenced
        // ConfigMap will be projected into the volume as a file whose name is the
        // key and content is the value. If specified, the listed keys will be
        // projected into the specified paths, and unlisted keys will not be
        // present. If a key is specified which is not present in the ConfigMap,
        // the volume setup will error. Paths must be relative and may not contain
        // the '..' path or start with '..'.
        Items       []KeyToPath `json:"items,omitempty"`
        // Mode bits to use on created files by default. The used mode bits will
        // be the provided AND 0777.
        // Directories within the path are not affected by this setting
        DefaultMode int32         `json:"defaultMode,omitempty"`
 }
 type KeyToPath struct {
        // The key to project.
        Key string `json:"key"`
        // The relative path of the file to map the key to.
        // May not be an absolute path.
        // May not contain the path element '..'.
        // May not start with the string '..'.
        Path string `json:"path"`
        // Mode bits to use on this file. The used mode bits will be the
        // provided AND 0777.
        Mode int32 `json:"mode,omitempty"`
 }
 type DownwardAPIVolumeSource struct {
        // Items is a list of DownwardAPIVolume file
        Items []DownwardAPIVolumeFile `json:"items,omitempty"`
        // Mode bits to use on created files by default. The used mode bits will
        // be the provided AND 0777.
        // Directories within the path are not affected by this setting
        DefaultMode int32         `json:"defaultMode,omitempty"`
 }
 type DownwardAPIVolumeFile struct {
        // Required: Path is  the relative path name of the file to be created. Must not be absolute or contain the '..' path. Must be utf-8 encoded. The first item of the relative path must not start with '..'
        Path string `json:"path"`
        // Required: Selects a field of the pod: only annotations, labels, name and  namespace are supported.
        FieldRef *ObjectFieldSelector `json:"fieldRef,omitempty"`
        // Selects a resource of the container: only resources limits and requests
        // (limits.cpu, limits.memory, requests.cpu and requests.memory) are currently supported.
        ResourceFieldRef *ResourceFieldSelector `json:"resourceFieldRef,omitempty"`
        // Mode bits to use on this file. The used mode bits will be the
        // provided AND 0777.
        Mode int32 `json:"mode,omitempty"`
 }
 ```
 Adding it there allows the user to change the mode bits of every file in the
 object, so it achieves the goal, while having the option to have a default and
 not specify all files in the object.
 The are two downside:
 * The files are symlinks pointint to the real file, and the realfile
   permissions are only set. The symlink has the classic symlink permissions.
   This is something already present in 1.3, and it seems applications like ssh
   work just fine with that. Something worth mentioning, but doesn't seem to be
   an issue.
 * If the secret/configMap/downwardAPI is mounted in more than one container,
   the file permissions will be the same on all. This is already the case for
   Key mappings and doesn't seem like a big issue either.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/secret-configmap-downwarapi-file-mode.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/security-context-constraints.md
+++ b/docs/proposals/security-context-constraints.md
@ -1,348 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/security-context-constraints.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/security-context-constraints.md)
 PodSecurityPolicy allows cluster administrators to control the creation and validation of a security
 context for a pod and containers.
 ## Motivation
 Administration of a multi-tenant cluster requires the ability to provide varying sets of permissions
 among the tenants, the infrastructure components, and end users of the system who may themselves be
 administrators within their own isolated namespace.
 Actors in a cluster may include infrastructure that is managed by administrators, infrastructure
 that is exposed to end users (builds, deployments), the isolated end user namespaces in the cluster, and
 the individual users inside those namespaces.  Infrastructure components that operate on behalf of a
 user (builds, deployments) should be allowed to run at an elevated level of permissions without
 granting the user themselves an elevated set of permissions.
 ## Goals
 1.  Associate [service accounts](../design/service_accounts.md), groups, and users with
 a set of constraints that dictate how a security context is established for a pod and the pod's containers.
 1.  Provide the ability for users and infrastructure components to run pods with elevated privileges
 on behalf of another user or within a namespace where privileges are more restrictive.
 1.  Secure the ability to reference elevated permissions or to change the constraints under which
 a user runs.
 ## Use Cases
 Use case 1:
 As an administrator, I can create a namespace for a person that can't create privileged containers
 AND enforce that the UID of the containers is set to a certain value
 Use case 2:
 As a cluster operator, an infrastructure component should be able to create a pod with elevated
 privileges in a namespace where regular users cannot create pods with these privileges or execute
 commands in that pod.
 Use case 3:
 As a cluster administrator, I can allow a given namespace (or service account) to create privileged
 pods or to run root pods
 Use case 4:
 As a cluster administrator, I can allow a project administrator to control the security contexts of
 pods and service accounts within a project
 ## Requirements
 1.  Provide a set of restrictions that controls how a security context is created for pods and containers
 as a new cluster-scoped object called `PodSecurityPolicy`.
 1.  User information in `user.Info` must be available to admission controllers. (Completed in
 https://github.com/GoogleCloudPlatform/kubernetes/pull/8203)
 1.  Some authorizers may restrict a user’s ability to reference a service account.  Systems requiring
 the ability to secure service accounts on a user level must be able to add a policy that enables
 referencing specific service accounts themselves.
 1.  Admission control must validate the creation of Pods against the allowed set of constraints.
 ## Design
 ### Model
 PodSecurityPolicy objects exist in the root scope, outside of a namespace.  The
 PodSecurityPolicy will reference users and groups that are allowed
 to operate under the constraints.  In order to support this, `ServiceAccounts` must be mapped
 to a user name or group list by the authentication/authorization layers.  This allows the security
 context to treat users, groups, and service accounts uniformly.
 Below is a list of PodSecurityPolicies which will likely serve most use cases:
 1.  A default policy object.  This object is permissioned to something which covers all actors, such
 as a `system:authenticated` group, and will likely be the most restrictive set of constraints.
 1.  A default constraints object for service accounts.  This object can be identified as serving
 a group identified by `system:service-accounts`, which can be imposed by the service account authenticator / token generator.
 1.  Cluster admin constraints identified by `system:cluster-admins` group - a set of constraints with elevated privileges that can be used
 by an administrative user or group.
 1.  Infrastructure components constraints which can be identified either by a specific service
 account or by a group containing all service accounts.
 ```go
 // PodSecurityPolicy governs the ability to make requests that affect the SecurityContext
 // that will be applied to a pod and container.
 type PodSecurityPolicy struct {
 	unversioned.TypeMeta `json:",inline"`
 	api.ObjectMeta       `json:"metadata,omitempty"`
 	// Spec defines the policy enforced.
 	Spec PodSecurityPolicySpec `json:"spec,omitempty"`
 }
 // PodSecurityPolicySpec defines the policy enforced.
 type PodSecurityPolicySpec struct {
 	// Privileged determines if a pod can request to be run as privileged.
 	Privileged bool `json:"privileged,omitempty"`
 	// Capabilities is a list of capabilities that can be added.
 	Capabilities []api.Capability `json:"capabilities,omitempty"`
 	// Volumes allows and disallows the use of different types of volume plugins.
 	Volumes VolumeSecurityPolicy `json:"volumes,omitempty"`
 	// HostNetwork determines if the policy allows the use of HostNetwork in the pod spec.
 	HostNetwork bool `json:"hostNetwork,omitempty"`
 	// HostPorts determines which host port ranges are allowed to be exposed.
 	HostPorts []HostPortRange `json:"hostPorts,omitempty"`
 	// HostPID determines if the policy allows the use of HostPID in the pod spec.
 	HostPID bool `json:"hostPID,omitempty"`
 	// HostIPC determines if the policy allows the use of HostIPC in the pod spec.
 	HostIPC bool `json:"hostIPC,omitempty"`
 	// SELinuxContext is the strategy that will dictate the allowable labels that may be set.
 	SELinuxContext SELinuxContextStrategyOptions `json:"seLinuxContext,omitempty"`
 	// RunAsUser is the strategy that will dictate the allowable RunAsUser values that may be set.
 	RunAsUser RunAsUserStrategyOptions `json:"runAsUser,omitempty"`
 	// The users who have permissions to use this policy
 	Users []string `json:"users,omitempty"`
 	// The groups that have permission to use this policy
 	Groups []string `json:"groups,omitempty"`
 }
 // HostPortRange defines a range of host ports that will be enabled by a policy
 // for pods to use.  It requires both the start and end to be defined.
 type HostPortRange struct {
 	// Start is the beginning of the port range which will be allowed.
 	Start int `json:"start"`
 	// End is the end of the port range which will be allowed.
 	End int `json:"end"`
 }
 // VolumeSecurityPolicy allows and disallows the use of different types of volume plugins.
 type VolumeSecurityPolicy struct {
 	// HostPath allows or disallows the use of the HostPath volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#hostpath
 	HostPath bool `json:"hostPath,omitempty"`
 	// EmptyDir allows or disallows the use of the EmptyDir volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#emptydir
 	EmptyDir bool `json:"emptyDir,omitempty"`
 	// GCEPersistentDisk allows or disallows the use of the GCEPersistentDisk volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#gcepersistentdisk
 	GCEPersistentDisk bool `json:"gcePersistentDisk,omitempty"`
 	// AWSElasticBlockStore allows or disallows the use of the AWSElasticBlockStore volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#awselasticblockstore
 	AWSElasticBlockStore bool `json:"awsElasticBlockStore,omitempty"`
 	// GitRepo allows or disallows the use of the GitRepo volume plugin.
 	GitRepo bool `json:"gitRepo,omitempty"`
 	// Secret allows or disallows the use of the Secret volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#secrets
 	Secret bool `json:"secret,omitempty"`
 	// NFS allows or disallows the use of the NFS volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/volumes#nfs
 	NFS bool `json:"nfs,omitempty"`
 	// ISCSI allows or disallows the use of the ISCSI volume plugin.
 	// More info: http://releases.k8s.io/HEAD/examples/volumes/iscsi/README.md
 	ISCSI bool `json:"iscsi,omitempty"`
 	// Glusterfs allows or disallows the use of the Glusterfs volume plugin.
 	// More info: http://releases.k8s.io/HEAD/examples/volumes/glusterfs/README.md
 	Glusterfs bool `json:"glusterfs,omitempty"`
 	// PersistentVolumeClaim allows or disallows the use of the PersistentVolumeClaim volume plugin.
 	// More info: http://kubernetes.io/docs/user-guide/persistent-volumes#persistentvolumeclaims
 	PersistentVolumeClaim bool `json:"persistentVolumeClaim,omitempty"`
 	// RBD allows or disallows the use of the RBD volume plugin.
 	// More info: http://releases.k8s.io/HEAD/examples/volumes/rbd/README.md
 	RBD bool `json:"rbd,omitempty"`
 	// Cinder allows or disallows the use of the Cinder volume plugin.
 	// More info: http://releases.k8s.io/HEAD/examples/mysql-cinder-pd/README.md
 	Cinder bool `json:"cinder,omitempty"`
 	// CephFS allows or disallows the use of the CephFS volume plugin.
 	CephFS bool `json:"cephfs,omitempty"`
 	// DownwardAPI allows or disallows the use of the DownwardAPI volume plugin.
 	DownwardAPI bool `json:"downwardAPI,omitempty"`
 	// FC allows or disallows the use of the FC volume plugin.
 	FC bool `json:"fc,omitempty"`
 }
 // SELinuxContextStrategyOptions defines the strategy type and any options used to create the strategy.
 type SELinuxContextStrategyOptions struct {
 	// Type is the strategy that will dictate the allowable labels that may be set.
 	Type SELinuxContextStrategy `json:"type"`
 	// seLinuxOptions required to run as; required for MustRunAs
 	// More info: http://releases.k8s.io/HEAD/docs/design/security_context.md#security-context
 	SELinuxOptions *api.SELinuxOptions `json:"seLinuxOptions,omitempty"`
 }
 // SELinuxContextStrategyType denotes strategy types for generating SELinux options for a
 // SecurityContext.
 type SELinuxContextStrategy string
 const (
 	// container must have SELinux labels of X applied.
 	SELinuxStrategyMustRunAs SELinuxContextStrategy = "MustRunAs"
 	// container may make requests for any SELinux context labels.
 	SELinuxStrategyRunAsAny SELinuxContextStrategy = "RunAsAny"
 )
 // RunAsUserStrategyOptions defines the strategy type and any options used to create the strategy.
 type RunAsUserStrategyOptions struct {
 	// Type is the strategy that will dictate the allowable RunAsUser values that may be set.
 	Type RunAsUserStrategy `json:"type"`
 	// UID is the user id that containers must run as.  Required for the MustRunAs strategy if not using
 	// a strategy that supports pre-allocated uids.
 	UID *int64 `json:"uid,omitempty"`
 	// UIDRangeMin defines the min value for a strategy that allocates by a range based strategy.
 	UIDRangeMin *int64 `json:"uidRangeMin,omitempty"`
 	// UIDRangeMax defines the max value for a strategy that allocates by a range based strategy.
 	UIDRangeMax *int64 `json:"uidRangeMax,omitempty"`
 }
 // RunAsUserStrategyType denotes strategy types for generating RunAsUser values for a
 // SecurityContext.
 type RunAsUserStrategy string
 const (
 	// container must run as a particular uid.
 	RunAsUserStrategyMustRunAs RunAsUserStrategy = "MustRunAs"
 	// container must run as a particular uid.
 	RunAsUserStrategyMustRunAsRange RunAsUserStrategy = "MustRunAsRange"
 	// container must run as a non-root uid
 	RunAsUserStrategyMustRunAsNonRoot RunAsUserStrategy = "MustRunAsNonRoot"
 	// container may make requests for any uid.
 	RunAsUserStrategyRunAsAny RunAsUserStrategy = "RunAsAny"
 )
 ```
 ### PodSecurityPolicy Lifecycle
 As reusable objects in the root scope, PodSecurityPolicy follows the lifecycle of the
 cluster itself.  Maintenance of constraints such as adding, assigning, or changing them is the
 responsibility of the cluster administrator.
 Creating a new user within a namespace should not require the cluster administrator to
 define the user's PodSecurityPolicy.  They should receive the default set of policies
 that the administrator has defined for the groups they are assigned.
 ## Default PodSecurityPolicy And Overrides
 In order to establish policy for service accounts and users, there must be a way
 to identify the default set of constraints that is to be used.  This is best accomplished by using
 groups.  As mentioned above, groups may be used by the authentication/authorization layer to ensure
 that every user maps to at least one group (with a default example of `system:authenticated`) and it
 is up to the cluster administrator to ensure that a `PodSecurityPolicy` object exists that
 references the group.
 If an administrator would like to provide a user with a changed set of security context permissions,
 they may do the following:
 1.  Create a new `PodSecurityPolicy` object and add a reference to the user or a group
 that the user belongs to.
 1.  Add the user (or group) to an existing `PodSecurityPolicy` object with the proper
 elevated privileges.
 ## Admission
 Admission control using an authorizer provides the ability to control the creation of resources
 based on capabilities granted to a user.  In terms of the `PodSecurityPolicy`, it means
 that an admission controller may inspect the user info made available in the context to retrieve
 an appropriate set of policies for validation.
 The appropriate set of PodSecurityPolicies is defined as all of the policies
 available that have reference to the user or groups that the user belongs to.
 Admission will use the PodSecurityPolicy to ensure that any requests for a
 specific security context setting are valid and to generate settings using the following approach:
 1.  Determine all the available `PodSecurityPolicy` objects that are allowed to be used
 1.  Sort the `PodSecurityPolicy` objects in a most restrictive to least restrictive order.
 1.  For each `PodSecurityPolicy`, generate a `SecurityContext` for each container.  The generation phase will not override
 any user requested settings in the `SecurityContext`, and will rely on the validation phase to ensure that
 the user requests are valid.
 1.  Validate the generated `SecurityContext` to ensure it falls within the boundaries of the `PodSecurityPolicy`
 1.  If all containers validate under a single `PodSecurityPolicy` then the pod will be admitted
 1.  If all containers DO NOT validate under the `PodSecurityPolicy` then try the next `PodSecurityPolicy`
 1.  If no `PodSecurityPolicy` validates for the pod then the pod will not be admitted
 ## Creation of a SecurityContext Based on PodSecurityPolicy
 The creation of a `SecurityContext` based on a `PodSecurityPolicy` is based upon the configured
 settings of the `PodSecurityPolicy`.
 There are three scenarios under which a `PodSecurityPolicy` field may fall:
 1.  Governed by a boolean: fields of this type will be defaulted to the most restrictive value.
 For instance, `AllowPrivileged` will always be set to false if unspecified.
 1.  Governed by an allowable set: fields of this type will be checked against the set to ensure
 their value is allowed.  For example, `AllowCapabilities` will ensure that only capabilities
 that are allowed to be requested are considered valid.  `HostNetworkSources` will ensure that
 only pods created from source X are allowed to request access to the host network.
 1.  Governed by a strategy: Items that have a strategy to generate a value will provide a
 mechanism to generate the value as well as a mechanism to ensure that a specified value falls into
 the set of allowable values.  See the Types section for the description of the interfaces that
 strategies must implement.
 Strategies have the ability to become dynamic.  In order to support a dynamic strategy it should be
 possible to make a strategy that has the ability to either be pre-populated with dynamic data by
 another component (such as an admission controller) or has the ability to retrieve the information
 itself based on the data in the pod.  An example of this would be a pre-allocated UID for the namespace.
 A dynamic `RunAsUser` strategy could inspect the namespace of the pod in order to find the required pre-allocated
 UID and generate or validate requests based on that information.
 ```go
 // SELinuxStrategy defines the interface for all SELinux constraint strategies.
 type SELinuxStrategy interface {
 	// Generate creates the SELinuxOptions based on constraint rules.
 	Generate(pod *api.Pod, container *api.Container) (*api.SELinuxOptions, error)
 	// Validate ensures that the specified values fall within the range of the strategy.
 	Validate(pod *api.Pod, container *api.Container) fielderrors.ValidationErrorList
 }
 // RunAsUserStrategy defines the interface for all uid constraint strategies.
 type RunAsUserStrategy interface {
 	// Generate creates the uid based on policy rules.
 	Generate(pod *api.Pod, container *api.Container) (*int64, error)
 	// Validate ensures that the specified values fall within the range of the strategy.
 	Validate(pod *api.Pod, container *api.Container) fielderrors.ValidationErrorList
 }
 ```
 ## Escalating Privileges by an Administrator
 An administrator may wish to create a resource in a namespace that runs with
 escalated privileges.   By allowing security context
 constraints to operate on both the requesting user and the pod's service account, administrators are able to
 create pods in namespaces with elevated privileges based on the administrator's security context
 constraints.
 This also allows the system to guard commands being executed in the non-conforming container.  For
 instance, an `exec` command can first check the security context of the pod against the security
 context constraints of the user or the user's ability to reference a service account.
 If it does not validate then it can block users from executing the command.  Since the validation
 will be user aware, administrators would still be able to run the commands that are restricted to normal users.
 ## Interaction with the Kubelet
 In certain cases, the Kubelet may need provide information about
 the image in order to validate the security context.  An example of this is a cluster
 that is configured to run with a UID strategy of `MustRunAsNonRoot`.
 In this case the admission controller can set the existing `MustRunAsNonRoot` flag on the `SecurityContext`
 based on the UID strategy of the `SecurityPolicy`.  It should still validate any requests on the pod
 for a specific UID and fail early if possible.  However, if the `RunAsUser` is not set on the pod
 it should still admit the pod and allow the Kubelet to ensure that the image does not run as
 `root` with the existing non-root checks.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/security-context-constraints.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/self-hosted-kubelet.md
+++ b/docs/proposals/self-hosted-kubelet.md
@ -1,135 +1 @@
-# Proposal: Self-hosted kubelet
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/self-hosted-kubelet.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/self-hosted-kubelet.md)
 ## Abstract
 In a self-hosted Kubernetes deployment (see [this
 comment](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959)
 for background on self hosted kubernetes), we have the initial bootstrap problem.
 When running self-hosted components, there needs to be a mechanism for pivoting
 from the initial bootstrap state to the kubernetes-managed (self-hosted) state.
 In the case of a self-hosted kubelet, this means pivoting from the initial
 kubelet defined and run on the host, to the kubelet pod which has been scheduled
 to the node.
 This proposal presents a solution to the kubelet bootstrap, and assumes a
 functioning control plane (e.g. an apiserver, controller-manager, scheduler, and
 etcd cluster), and a kubelet that can securely contact the API server. This
 functioning control plane can be temporary, and not necessarily the "production"
 control plane that will be used after the initial pivot / bootstrap.
 ## Background and Motivation
 In order to understand the goals of this proposal, one must understand what
 "self-hosted" means. This proposal defines "self-hosted" as a kubernetes cluster
 that is installed and managed by the kubernetes installation itself. This means
 that each kubernetes component is described by a kubernetes manifest (Daemonset,
 Deployment, etc) and can be updated via kubernetes.
 The overall goal of this proposal is to make kubernetes easier to install and
 upgrade. We can then treat kubernetes itself just like any other application
 hosted in a kubernetes cluster, and have access to easy upgrades, monitoring,
 and durability for core kubernetes components themselves.
 We intend to achieve this by using kubernetes to manage itself.  However, in
 order to do that we must first "bootstrap" the cluster, by using kubernetes to
 install kubernetes components. This is where this proposal fits in, by
 describing the necessary modifications, and required procedures, needed to run a
 self-hosted kubelet.
 The approach being proposed for a self-hosted kubelet is a "pivot" style
 installation.  This procedure assumes a short-lived “bootstrap” kubelet will run
 and start a long-running “self-hosted” kubelet. Once the self-hosted kubelet is
 running the bootstrap kubelet will exit. As part of this, we propose introducing
 a new `--bootstrap` flag to the kubelet. The behaviour of that flag will be
 explained in detail below.
 ## Proposal
 We propose adding a new flag to the kubelet, the `--bootstrap` flag, which is
 assumed to be used in conjunction with the `--lock-file` flag. The `--lock-file`
 flag is used to ensure only a single kubelet is running at any given time during
 this pivot process. When the `--bootstrap` flag is provided, after the kubelet
 acquires the file lock, it will begin asynchronously waiting on
 [inotify](http://man7.org/linux/man-pages/man7/inotify.7.html) events. Once an
 "open" event is received, the kubelet will assume another kubelet is attempting
 to take control and will exit by calling `exit(0)`.
 Thus, the initial bootstrap becomes:
 1. "bootstrap" kubelet is started by $init system.
 1. "bootstrap" kubelet pulls down "self-hosted" kubelet as a pod from a
   daemonset
 1. "self-hosted" kubelet attempts to acquire the file lock, causing "bootstrap"
   kubelet to exit
 1. "self-hosted" kubelet acquires lock and takes over
 1. "bootstrap" kubelet is restarted by $init system and blocks on acquiring the
   file lock
 During an upgrade of the kubelet, for simplicity we will consider 3 kubelets,
 namely "bootstrap", "v1", and "v2". We imagine the following scenario for
 upgrades:
 1. Cluster administrator introduces "v2" kubelet daemonset
 1. "v1" kubelet pulls down and starts "v2"
 1. Cluster administrator removes "v1" kubelet daemonset
 1. "v1" kubelet is killed
 1. Both "bootstrap" and "v2" kubelets race for file lock
 1. If "v2" kubelet acquires lock, process has completed
 1. If "bootstrap" kubelet acquires lock, it is assumed that "v2" kubelet will
   fail a health check and be killed. Once restarted, it will try to acquire the
   lock, triggering the "bootstrap" kubelet to exit.
 Alternatively, it would also be possible via this mechanism to delete the "v1"
 daemonset first, allow the "bootstrap" kubelet to take over, and then introduce
 the "v2" kubelet daemonset, effectively eliminating the race between "bootstrap"
 and "v2" for lock acquisition, and the reliance on the failing health check
 procedure.
 Eventually this could be handled by a DaemonSet upgrade policy.
 This will allow a "self-hosted" kubelet with minimal new concepts introduced
 into the core Kubernetes code base, and remains flexible enough to work well
 with future [bootstrapping
 services](https://github.com/kubernetes/kubernetes/issues/5754).
 ## Production readiness considerations / Out of scope issues
 * Deterministically pulling and running kubelet pod: we would prefer not to have
  to loop until we finally get a kubelet pod.
 * It is possible that the bootstrap kubelet version is incompatible with the
  newer versions that were run in the node. For example, the cgroup
  configurations might be incompatible. In the beginning, we will require
  cluster admins to keep the configuration in sync. Since we want the bootstrap
  kubelet to come up and run even if the API server is not available, we should
  persist the configuration for bootstrap kubelet on the node. Once we have
  checkpointing in kubelet, we will checkpoint the updated config and have the
  bootstrap kubelet use the updated config, if it were to take over.
 * Currently best practice when upgrading the kubelet on a node is to drain all
  pods first. Automatically draining of the node during kubelet upgrade is out
  of scope for this proposal. It is assumed that either the cluster
  administrator or the daemonset upgrade policy will handle this.
 ## Other discussion
 Various similar approaches have been discussed
 [here](https://github.com/kubernetes/kubernetes/issues/246#issuecomment-64533959)
 and
 [here](https://github.com/kubernetes/kubernetes/issues/23073#issuecomment-198478997).
 Other discussion around the kubelet being able to be run inside a container is
 [here](https://github.com/kubernetes/kubernetes/issues/4869). Note this isn't a
 strict requirement as the kubelet could be run in a chroot jail via rkt fly or
 other such similar approach.
 Additionally, [Taints and
 Tolerations](../../docs/design/taint-toleration-dedicated.md), whose design has
 already been accepted, would make the overall kubelet bootstrap more
 deterministic. With this, we would also need the ability for a kubelet to
 register itself with a given taint when it first contacts the API server. Given
 that, a kubelet could register itself with a given taint such as
 “component=kubelet”, and a kubelet pod could exist that has a toleration to that
 taint, ensuring it is the only pod the “bootstrap” kubelet runs.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/self-hosted-kubelet.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/selinux-enhancements.md
+++ b/docs/proposals/selinux-enhancements.md
@ -1,209 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/selinux-enhancements.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/selinux-enhancements.md)
 Presents a proposal for enhancing the security of Kubernetes clusters using
 SELinux and simplifying the implementation of SELinux support within the
 Kubelet by removing the need to label the Kubelet directory with an SELinux
 context usable from a container.
 ## Motivation
 The current Kubernetes codebase relies upon the Kubelet directory being
 labeled with an SELinux context usable from a container.  This means that a
 container escaping namespace isolation will be able to use any file within the
 Kubelet directory without defeating kernel
 [MAC (mandatory access control)](https://en.wikipedia.org/wiki/Mandatory_access_control).
 In order to limit the attack surface, we should enhance the Kubelet to relabel
 any bind-mounts into containers into a usable SELinux context without depending
 on the Kubelet directory's SELinux context.
 ## Constraints and Assumptions
 1.  No API changes allowed
 2.  Behavior must be fully backward compatible
 3.  No new admission controllers - make incremental improvements without huge
    refactorings
 ## Use Cases
 1.  As a cluster operator, I want to avoid having to label the Kubelet
    directory with a label usable from a container, so that I can limit the
    attack surface available to a container escaping its namespace isolation
 2.  As a user, I want to run a pod without an SELinux context explicitly
    specified and be isolated using MCS (multi-category security) on systems
    where SELinux is enabled, so that the pods on each host are isolated from
    one another
 3.  As a user, I want to run a pod that uses the host IPC or PID namespace and
    want the system to do the right thing with regard to SELinux, so that no
    unnecessary relabel actions are performed
 ### Labeling the Kubelet directory
 As previously stated, the current codebase relies on the Kubelet directory
 being labeled with an SELinux context usable from a container.  The Kubelet
 uses the SELinux context of this directory to determine what SELinux context
 `tmpfs` mounts (provided by the EmptyDir memory-medium option) should receive.
 The problem with this is that it opens an attack surface to a container that
 escapes its namespace isolation; such a container would be able to use any
 file in the Kubelet directory without defeating kernel MAC.
 ### SELinux when no context is specified
 When no SELinux context is specified, Kubernetes should just do the right
 thing, where doing the right thing is defined as isolating pods with a node-
 unique set of categories.  Node-uniqueness means unique among the pods
 scheduled onto the node.  Long-term, we want to have a cluster-wide allocator
 for MCS labels. Node-unique MCS labels are a good middle ground that is
 possible without a new, large, feature.
 ### SELinux and host IPC and PID namespaces
 Containers in pods that use the host IPC or PID namespaces need access to
 other processes and IPC mechanisms on the host.  Therefore, these containers
 should be run with the `spc_t` SELinux type by the container runtime.  The
 `spc_t` type is an unconfined type that other SELinux domains are allowed to
 connect to.  In the case where a pod uses one of these host namespaces, it
 should be unnecessary to relabel the pod's volumes.
 ## Analysis
 ### Libcontainer SELinux library
 Docker and rkt both use the libcontainer SELinux library.  This library
 provides a method, `GetLxcContexts`, that returns the a unique SELinux
 contexts for container processes and files used by them.  `GetLxcContexts`
 reads the base SELinux context information from a file at `/etc/selinux/<policy-
 name>/contexts/lxc_contexts` and then adds a process-unique MCS label.
 Docker and rkt both leverage this call to determine the 'starting' SELinux
 contexts for containers.
 ### Docker
 Docker's behavior when no SELinux context is defined for a container is to
 give the container a node-unique MCS label.
 #### Sharing IPC namespaces
 On the Docker runtime, the containers in a Kubernetes pod share the IPC and
 PID namespaces of the pod's infra container.
 Docker's behavior for containers sharing these namespaces is as follows: if a
 container B shares the IPC namespace of another container A, container B is
 given the SELinux context of container A.  Therefore, for Kubernetes pods
 running on docker, in a vacuum the containers in a pod should have the same
 SELinux context.
 [**Known issue**](https://bugzilla.redhat.com/show_bug.cgi?id=1377869): When
 the seccomp profile is set on a docker container that shares the IPC namespace
 of another container, that container will not receive the other container's
 SELinux context.
 #### Host IPC and PID namespaces
 In the case of a pod that shares the host IPC or PID namespace, this flag is
 simply ignored and the container receives the `spc_t` SELinux type.  The
 `spc_t` type is unconfined, and so no relabeling needs to be done for volumes
 for these pods.  Currently, however, there is code which relabels volumes into
 explicitly specified SELinux contexts for these pods. This code is unnecessary
 and should be removed.
 #### Relabeling bind-mounts
 Docker is capable of relabeling bind-mounts into containers using the `:Z`
 bind-mount flag.  However, in the current implementation of the docker runtime
 in Kubernetes, the `:Z` option is only applied when the pod's SecurityContext
 contains an SELinux context.  We could easily implement the correct behaviors
 by always setting `:Z` on systems where SELinux is enabled.
 ### rkt
 rkt's behavior when no SELinux context is defined for a pod is similar to
 Docker's -- an SELinux context with a node-unique MCS label is given to the
 containers of a pod.
 #### Sharing IPC namespaces
 Containers (apps, in rkt terminology) in rkt pods share an IPC and PID
 namespace by default.
 #### Relabeling bind-mounts
 Bind-mounts into rkt pods are automatically relabeled into the pod's SELinux
 context.
 #### Host IPC and PID namespaces
 Using the host IPC and PID namespaces is not currently supported by rkt.
 ## Proposed Changes
 ### Refactor `pkg/util/selinux`
 1.  The `selinux` package should provide a method `SELinuxEnabled` that returns
    whether SELinux is enabled, and is built for all platforms (the
    libcontainer SELinux is only built on linux)
 2.  The `SelinuxContextRunner` interface should be renamed to `SELinuxRunner`
    and be changed to have the same method names and signatures as the
    libcontainer methods its implementations wrap
 3.  The `SELinuxRunner` interface only needs `Getfilecon`, which is used by
    the rkt code
 ```go
 package selinux
 // Note: the libcontainer SELinux package is only built for Linux, so it is
 // necessary to have a NOP wrapper which is built for non-Linux platforms to
 // allow code that links to this package not to differentiate its own methods
 // for Linux and non-Linux platforms.
 //
 // SELinuxRunner wraps certain libcontainer SELinux calls. For more
 // information, see:
 //
 // https://github.com/opencontainers/runc/blob/master/libcontainer/selinux/selinux.go
 type SELinuxRunner interface {
 	// Getfilecon returns the SELinux context for the given path or returns an
 	// error.
 	Getfilecon(path string) (string, error)
 }
 ```
 ### Kubelet Changes
 1.  The `relabelVolumes` method in `kubelet_volumes.go` is not needed and can
    be removed
 2.  The `GenerateRunContainerOptions` method in `kubelet_pods.go` should no
    longer call `relabelVolumes`
 3.  The `makeHostsMount` method in `kubelet_pods.go` should set the
    `SELinuxRelabel` attribute of the mount for the pod's hosts file to `true`
 ### Changes to `pkg/kubelet/dockertools/`
 1.  The `makeMountBindings` should be changed to:
  1.  No longer accept the `podHasSELinuxLabel` parameter
  2.  Always use the `:Z` bind-mount flag when SELinux is enabled and the mount
      has the `SELinuxRelabel` attribute set to `true`
 2.  The `runContainer` method should be changed to always use the `:Z`
    bind-mount flag on the termination message mount when SELinux is enabled
 ### Changes to `pkg/kubelet/rkt`
 The should not be any required changes for the rkt runtime; we should test to
 ensure things work as expected under rkt.
 ### Changes to volume plugins and infrastructure
 1.  The `VolumeHost` interface contains a method called `GetRootContext`; this
    is an artifact of the old assumptions about the Kubelet directory's SELinux
    context and can be removed
 2.  The `empty_dir.go` file should be changed to be completely agnostic of
    SELinux; no behavior in this plugin needs to be differentiated when SELinux
    is enabled
 ### Changes to `pkg/controller/...`
 The `VolumeHost` abstraction is used in a couple of PV controllers as NOP
 implementations.  These should be altered to no longer include `GetRootContext`.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/selinux-enhancements.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/service-discovery.md
+++ b/docs/proposals/service-discovery.md
@ -1,69 +1 @@
-# Service Discovery Proposal
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/service-discovery.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/service-discovery.md)
 ## Goal of this document
 To consume a service, a developer needs to know the full URL and a description of the API. Kubernetes contains the host and port information of a service, but it lacks the scheme and the path information needed if the service is not bound at the root. In this document we propose some standard kubernetes service annotations to fix these gaps. It is important that these annotations are a standard to allow for standard service discovery across Kubernetes implementations. Note that the example largely speaks to consuming WebServices but that the same concepts apply to other types of services.
 ## Endpoint URL, Service Type
 A URL can accurately describe the location of a Service. A generic URL is of the following form
    scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
 however for the purpose of service discovery we can simplify this to the following form
    scheme:[//host[:port]][/]path
 If a user and/or password is required then this information can be passed using Kubernetes Secrets. Kubernetes contains the host and port of each service but it lacks the scheme and path.
 `Service Path` - Every Service has one (or more) endpoint. As a rule the endpoint should be located at the root "/" of the location URL, i.e. `http://172.100.1.52/`. There are cases where this is not possible and the actual service endpoint could be located at `http://172.100.1.52/cxfcdi`. The Kubernetes metadata for a service does not capture the path part, making it hard to consume this service.
 `Service Scheme` - Services can be deployed using different schemes. Some popular schemes include `http`,`https`,`file`,`ftp` and `jdbc`.
 `Service Protocol` - Services use different protocols that clients need to speak in order to communicate with the service, some examples of service level protocols are SOAP, REST (Yes, technically REST isn’t a protocol but an architectural style). For service consumers it can be hard to tell what protocol is expected.
 ## Service Description
 The API of a service is the point of interaction with a service consumer. The description of the API is an essential piece of information at creation time of the service consumer. It has become common to publish a service definition document on a know location on the service itself. This 'well known' place it not very standard, so it is proposed the service developer provides the service description path and the type of Definition Language (DL) used.
 `Service Description Path` - To facilitate the consumption of the service by client, the location this document would be greatly helpful to the service consumer. In some cases the client side code can be generated from such a document. It is assumed that the service description document is published somewhere on the service endpoint itself.
 `Service Description Language` - A number of Definition Languages (DL) have been developed to describe the service. Some of examples are `WSDL`, `WADL` and `Swagger`. In order to consume a description document it is good to know the type of DL used.
 ## Standard Service Annotations
 Kubernetes allows the creation of Service Annotations. Here we propose the use of the following standard annotations
 * `api.service.kubernetes.io/path` - the path part of the service endpoint url. An example value could be `cxfcdi`,
 * `api.service.kubernetes.io/scheme` - the scheme part of the service endpoint url. Some values could be `http` or `https`.
 * `api.service.kubernetes.io/protocol` - the protocol of the service. Known values are `SOAP`, `XML-RPC` and `REST`,
 * `api.service.kubernetes.io/description-path` - the path part of the service description document’s endpoint. It is a pretty safe assumption that the service self-documents. An example value for a swagger 2.0 document can be `cxfcdi/swagger.json`,
 * `api.kubernetes.io/description-language` - the type of Description Language used. Known values are `WSDL`, `WADL`, `SwaggerJSON`, `SwaggerYAML`.
 The fragment below is taken from the service section of the kubernetes.json were these annotations are used
    ...
    "objects" : [ {
      "apiVersion" : "v1",
      "kind" : "Service",
      "metadata" : {
        "annotations" : {
          "api.service.kubernetes.io/protocol" : "REST",
          "api.service.kubernetes.io/scheme" "http",
          "api.service.kubernetes.io/path" : "cxfcdi",
          "api.service.kubernetes.io/description-path" : "cxfcdi/swagger.json",
          "api.service.kubernetes.io/description-language" : "SwaggerJSON"
        },
    ...
 ## Conclusion
 Five service annotations are proposed as a standard way to describe a service endpoint. These five annotation are promoted as a Kubernetes standard, so that services can be discovered and a service catalog can be build to facilitate service consumers.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/service-discovery.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/service-external-name.md
+++ b/docs/proposals/service-external-name.md
@ -1,161 +1 @@
-# Service externalName
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/service-external-name.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/service-external-name.md)
 Author: Tim Hockin (@thockin), Rodrigo Campos (@rata), Rudi C (@therc)
 Date: August 2016
 Status: Implementation in progress
 # Goal
 Allow a service to have a CNAME record in the cluster internal DNS service. For
 example, the lookup for a `db` service could return a CNAME that points to the
 RDS resource `something.rds.aws.amazon.com`. No proxying is involved.
 # Motivation
 There were many related issues, but we'll try to summarize them here. More info
 is on GitHub issues/PRs: #13748, #11838, #13358, #23921
 One motivation is to present as native cluster services, services that are
 hosted externally. Some cloud providers, like AWS, hand out hostnames (IPs are
 not static) and the user wants to refer to these services using regular
 Kubernetes tools. This was requested in bugs, at least for AWS, for RedShift,
 RDS, Elasticsearch Service, ELB, etc.
 Other users just want to use an external service, for example `oracle`, with dns
 name `oracle-1.testdev.mycompany.com`, without having to keep DNS in sync, and
 are fine with a CNAME.
 Another use case is to "integrate" some services for local development. For
 example, consider a search service running in Kubernetes in staging, let's say
 `search-1.stating.mycompany.com`. It's running on AWS, so it resides behind an
 ELB (which has no static IP, just a hostname). A developer is building an app
 that consumes `search-1`, but doesn't want to run it on their machine (before
 Kubernetes, they didn't, either). They can just create a service that has a
 CNAME to the `search-1` endpoint in staging and be happy as before.
 Also, Openshift needs this for "service refs". Service ref is really just the
 three use cases mentioned above, but in the future a way to automatically inject
 "service ref"s into namespaces via "service catalog"[1] might be considered. And
 service ref is the natural way to integrate an external service, since it takes
 advantage of native DNS capabilities already in wide use.
 [1]: https://github.com/kubernetes/kubernetes/pull/17543
 # Alternatives considered
 In the issues linked above, some alternatives were also considered. A partial
 summary of them follows.
 One option is to add the hostname to endpoints, as proposed in
 https://github.com/kubernetes/kubernetes/pull/11838. This is problematic, as
 endpoints are used in many places and users assume the required fields (such as
 IP address) are always present and valid (and check that, too). If the field is
 not required anymore or if there is just a hostname instead of the IP,
 applications could break. Even assuming those cases could be solved, the
 hostname will have to be resolved, which presents further questions and issues:
 the timeout to use, whether the lookup is synchronous or asynchronous, dealing
 with DNS TTL and more. One imperfect approach was to only resolve the hostname
 upon creation, but this was considered not a great idea. A better approach
 would be at a higher level, maybe a service type.
 There are more ideas described in #13748, but all raised further issues,
 ranging from using another upstream DNS server to creating a Name object
 associated with DNSs.
 # Proposed solution
 The proposed solution works at the service layer, by adding a new `externalName`
 type for services. This will create a CNAME record in the internal cluster DNS
 service. No virtual IP or proxying is involved.
 Using a CNAME gets rid of unnecessary DNS lookups. There's no need for the
 Kubernetes control plane to issue them, to pick a timeout for them and having to
 refresh them when the TTL for a record expires. It's way simpler to implement,
 while solving the right problem. And addressing it at the service layer avoids
 all the complications mentioned above about doing it at the endpoints layer.
 The solution was outlined by Tim Hockin in
 https://github.com/kubernetes/kubernetes/issues/13748#issuecomment-230397975
 Currently a ServiceSpec looks like this, with comments edited for clarity:
 ```
 type ServiceSpec struct {
    Ports []ServicePort
    // If not specified, the associated Endpoints object is not automatically managed
    Selector map[string]string
    // "", a real IP, or "None".  If not specified, this is default allocated.  If "None", this Service is not load-balanced
    ClusterIP string
    // ClusterIP, NodePort, LoadBalancer.  Only applies if clusterIP != "None"
    Type ServiceType
    // Only applies if clusterIP != "None"
    ExternalIPs []string
    SessionAffinity ServiceAffinity
    // Only applies to type=LoadBalancer
    LoadBalancerIP string
    LoadBalancerSourceRanges []string
 ```
 The proposal is to change it to:
 ```
 type ServiceSpec struct {
    Ports []ServicePort
    // If not specified, the associated Endpoints object is not automatically managed
 +   // Only applies if type is ClusterIP, NodePort, or LoadBalancer.  If type is ExternalName, this is ignored.
    Selector map[string]string
    // "", a real IP, or "None".  If not specified, this is default allocated.  If "None", this Service is not load-balanced.
 +   // Only applies if type is ClusterIP, NodePort, or LoadBalancer.  If type is ExternalName, this is ignored.
    ClusterIP string
 -   // ClusterIP, NodePort, LoadBalancer.  Only applies if clusterIP != "None"
 +   // ExternalName, ClusterIP, NodePort, LoadBalancer.  Only applies if clusterIP != "None"
    Type ServiceType
 +   // Only applies if type is ExternalName
 +   ExternalName string
    // Only applies if clusterIP != "None"
    ExternalIPs []string
    SessionAffinity ServiceAffinity
    // Only applies to type=LoadBalancer
    LoadBalancerIP string
    LoadBalancerSourceRanges []string
 ```
 For example, it can be used like this:
 ```
 apiVersion: v1
 kind: Service
 metadata:
  name: my-rds
 spec:
  ports:
  - port: 12345
 type: ExternalName
 externalName: myapp.rds.whatever.aws.says
 ```
 There is one issue to take into account, that no other alternative considered
 fixes, either: TLS. If the service is a CNAME for an endpoint that uses TLS,
 connecting with the Kubernetes name `my-service.my-ns.svc.cluster.local` may
 result in a failure during server certificate validation. This is acknowledged
 and left for future consideration. For the time being, users and administrators
 might need to ensure that the server certificates also mentions the Kubernetes
 name as an alternate host name.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/service-external-name.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/stateful-apps.md
+++ b/docs/proposals/stateful-apps.md
@ -1,363 +1 @@
-# StatefulSets: Running pods which need strong identity and storage
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/stateful-apps.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/stateful-apps.md)
 ## Motivation
 Many examples of clustered software systems require stronger guarantees per instance than are provided
 by the Replication Controller (aka Replication Controllers). Instances of these systems typically require:
 1. Data per instance which should not be lost even if the pod is deleted, typically on a persistent volume
   * Some cluster instances may have tens of TB of stored data - forcing new instances to replicate data
     from other members over the network is onerous
 2. A stable and unique identity associated with that instance of the storage - such as a unique member id
 3. A consistent network identity that allows other members to locate the instance even if the pod is deleted
 4. A predictable number of instances to ensure that systems can form a quorum
   * This may be necessary during initialization
 5. Ability to migrate from node to node with stable network identity (DNS name)
 6. The ability to scale up in a controlled fashion, but are very rarely scaled down without human
   intervention
 Kubernetes should expose a pod controller (a StatefulSet) that satisfies these requirements in a flexible
 manner. It should be easy for users to manage and reason about the behavior of this set. An administrator
 with familiarity in a particular cluster system should be able to leverage this controller and its
 supporting documentation to run that clustered system on Kubernetes. It is expected that some adaptation
 is required to support each new cluster.
 This resource is **stateful** because it offers an easy way to link a pod's network identity to its storage
 identity and because it is intended to be used to run software that is the holders of state for other
 components. That does not mean that all stateful applications *must* use StatefulSets, but the tradeoffs
 in this resource are intended to facilitate holding state in the cluster.
 ## Use Cases
 The software listed below forms the primary use-cases for a StatefulSet on the cluster - problems encountered
 while adapting these for Kubernetes should be addressed in a final design.
 * Quorum with Leader Election
  * MongoDB - in replica set mode forms a quorum with an elected leader, but instances must be preconfigured
    and have stable network identities.
  * ZooKeeper - forms a quorum with an elected leader, but is sensitive to cluster membership changes and
    replacement instances *must* present consistent identities
  * etcd - forms a quorum with an elected leader, can alter cluster membership in a consistent way, and
    requires stable network identities
 * Decentralized Quorum
  * Cassandra - allows flexible consistency and distributes data via innate hash ring sharding, is also
    flexible to scaling, more likely to support members that come and go. Scale down may trigger massive
    rebalances.
 * Active-active
  * Galera - has multiple active masters which must remain in sync
 * Leader-followers
  * Spark in standalone mode - A single unilateral leader and a set of workers
 ## Background
 Replica sets are designed with a weak guarantee - that there should be N replicas of a particular
 pod template. Each pod instance varies only by name, and the replication controller errs on the side of
 ensuring that N replicas exist as quickly as possible (by creating new pods as soon as old ones begin graceful
 deletion, for instance, or by being able to pick arbitrary pods to scale down). In addition, pods by design
 have no stable network identity other than their assigned pod IP, which can change over the lifetime of a pod
 resource. ReplicaSets are best leveraged for stateless, shared-nothing, zero-coordination,
 embarassingly-parallel, or fungible software.
 While it is possible to emulate the guarantees described above by leveraging multiple replication controllers
 (for distinct pod templates and pod identities) and multiple services (for stable network identity), the
 resulting objects are hard to maintain and must be copied manually in order to scale a cluster.
 By constrast, a DaemonSet *can* offer some of the guarantees above, by leveraging Nodes as stable, long-lived
 entities. An administrator might choose a set of nodes, label them a particular way, and create a
 DaemonSet that maps pods to each node. The storage of the node itself (which could be network attached
 storage, or a local SAN) is the persistent storage. The network identity of the node is the stable
 identity. However, while there are examples of clustered software that benefit from close association to
 a node, this creates an undue burden on administrators to design their cluster to satisfy these
 constraints, when a goal of Kubernetes is to decouple system administration from application management.
 ## Design Assumptions
 * **Specialized Controller** - Rather than increase the complexity of the ReplicaSet to satisfy two distinct
  use cases, create a new resource that assists users in solving this particular problem.
 * **Safety first** - Running a clustered system on Kubernetes should be no harder
  than running a clustered system off Kube. Authors should be given tools to guard against common cluster
  failure modes (split brain, phantom member) to prevent introducing more failure modes. Sophisticated
  distributed systems designers can implement more sophisticated solutions than StatefulSet if necessary -
  new users should not become vulnerable to additional failure modes through an overly flexible design.
 * **Controlled scaling** - While flexible scaling is important for some clusters, other examples of clusters
  do not change scale without significant external intervention. Human intervention may be required after
  scaling. Changing scale during cluster operation can lead to split brain in quorum systems. It should be
  possible to scale, but there may be responsibilities on the set author to correctly manage the scale.
 * **No generic cluster lifecycle** - Rather than design a general purpose lifecycle for clustered software,
  focus on ensuring the information necessary for the software to function is available. For example,
  rather than providing a "post-creation" hook invoked when the cluster is complete, provide the necessary
  information to the "first" (or last) pod to determine the identity of the remaining cluster members and
  allow it to manage its own initialization.
 ## Proposed Design
 Add a new resource to Kubernetes to represent a set of pods that are individually distinct but each
 individual can safely be replaced-- the name **StatefulSet** is chosen to convey that the individual members of
 the set are themselves "stateful" and thus each one is preserved. Each member has an identity, and there will
 always be a member that thinks it is the "first" one.
 The StatefulSet is responsible for creating and maintaining a set of **identities** and ensuring that there is
 one pod and zero or more **supporting resources** for each identity. There should never be more than one pod
 or unique supporting resource per identity at any one time. A new pod can be created for an identity only
 if a previous pod has been fully terminated (reached its graceful termination limit or cleanly exited).
 A StatefulSet has 0..N **members**, each with a unique **identity** which is a name that is unique within the
 set.
 ```
 type StatefulSet struct {
  ObjectMeta
  Spec StatefulSetSpec
  ...
 }
 type StatefulSetSpec struct {
  // Replicas is the desired number of replicas of the given template.
  // Each replica is assigned a unique name of the form `name-$replica`
  // where replica is in the range `0 - (replicas-1)`.
  Replicas int
  // A label selector that "owns" objects created under this set
  Selector *LabelSelector
  // Template is the object describing the pod that will be created - each
  // pod created by this set will match the template, but have a unique identity.
  Template *PodTemplateSpec
  // VolumeClaimTemplates is a list of claims that members are allowed to reference.
  // The StatefulSet controller is responsible for mapping network identities to
  // claims in a way that maintains the identity of a member. Every claim in
  // this list must have at least one matching (by name) volumeMount in one
  // container in the template. A claim in this list takes precedence over
  // any volumes in the template, with the same name.
  VolumeClaimTemplates []PersistentVolumeClaim
  // ServiceName is the name of the service that governs this StatefulSet.
  // This service must exist before the StatefulSet, and is responsible for
  // the network identity of the set. Members get DNS/hostnames that follow the
  // pattern: member-specific-string.serviceName.default.svc.cluster.local
  // where "member-specific-string" is managed by the StatefulSet controller.
  ServiceName string
 }
 ```
 Like a replication controller, a StatefulSet may be targeted by an autoscaler. The StatefulSet makes no assumptions
 about upgrading or altering the pods in the set for now - instead, the user can trigger graceful deletion
 and the StatefulSet will replace the terminated member with the newer template once it exits. Future proposals
 may offer update capabilities. A StatefulSet requires RestartAlways pods. The addition of forgiveness may be
 necessary in the future to increase the safety of the controller recreating pods.
 ### How identities are managed
 A key question is whether scaling down a StatefulSet and then scaling it back up should reuse identities. If not,
 scaling down becomes a destructive action (an admin cannot recover by scaling back up). Given the safety
 first assumption, identity reuse seems the correct default. This implies that identity assignment should
 be deterministic and not subject to controller races (a controller that has crashed during scale up should
 assign the same identities on restart, and two concurrent controllers should decide on the same outcome
 identities).
 The simplest way to manage identities, and easiest to understand for users, is a numeric identity system
 starting at I=0 that ranges up to the current replica count and is contiguous.
 Future work:
 * Cover identity reclamation - cleaning up resources for identities that are no longer in use.
 * Allow more sophisticated identity assignment - instead of `{name}-{0 - replicas-1}`, allow subsets and
  complex indexing.
 ### Controller behavior.
 When a StatefulSet is scaled up, the controller must create both pods and supporting resources for
 each new identity. The controller must create supporting resources for the pod before creating the
 pod. If a supporting resource with the appropriate name already exists, the controller should treat that as
 creation succeeding. If a supporting resource cannot be created, the controller should flag an error to
 status, back-off (like a scheduler or replication controller), and try again later. Each resource created
 by a StatefulSet controller must have a set of labels that match the selector, support orphaning, and have a
 controller back reference annotation identifying the owning StatefulSet by name and UID.
 When a StatefulSet is scaled down, the pod for the removed indentity should be deleted. It is less clear what the
 controller should do to supporting resources. If every pod requires a PV, and a user accidentally scales
 up to N=200 and then back down to N=3, leaving 197 PVs lying around may be undesirable (potential for
 abuse). On the other hand, a cluster of 5 that is accidentally scaled down to 3 might irreparably destroy
 the cluster if the PV for identities 4 and 5 are deleted (may not be recoverable). For the initial proposal,
 leaving the supporting resources is the safest path (safety first) with a potential future policy applied
 to the StatefulSet for how to manage supporting resources (DeleteImmediately, GarbageCollect, Preserve).
 The controller should reflect summary counts of resources on the StatefulSet status to enable clients to easily
 understand the current state of the set.
 ### Parameterizing pod templates and supporting resources
 Since each pod needs a unique and distinct identity, and the pod needs to know its own identity, the
 StatefulSet must allow a pod template to be parameterized by the identity assigned to the pod. The pods that
 are created should be easily identified by their cluster membership.
 Because that pod needs access to stable storage, the StatefulSet may specify a template for one or more
 **persistent volume claims** that can be used for each distinct pod. The name of the volume claim must
 match a volume mount within the pod template.
 Future work:
 * In the future other resources may be added that must also be templated - for instance, secrets (unique secret per member), config data (unique config per member), and in the futher future, arbitrary extension resources.
 * Consider allowing the identity value itself to be passed as an environment variable via the downward API
 * Consider allowing per identity values to be specified that are passed to the pod template or volume claim.
 ### Accessing pods by stable network identity
 In order to provide stable network identity, given that pods may not assume pod IP is constant over the
 lifetime of a pod, it must be possible to have a resolvable DNS name for the pod that is tied to the
 pod identity. There are two broad classes of clustered services - those that require clients to know
 all members of the cluster (load balancer intolerant) and those that are amenable to load balancing.
 For the former, clients must also be able to easily enumerate the list of DNS names that represent the
 member identities and access them inside the cluster. Within a pod, it must be possible for containers
 to find and access that DNS name for identifying itself to the cluster.
 Since a pod is expected to be controlled by a single controller at a time, it is reasonable for a pod to
 have a single identity at a time. Therefore, a service can expose a pod by its identity in a unique
 fashion via DNS by leveraging information written to the endpoints by the endpoints controller.
 The end result might be DNS resolution as follows:
 ```
 # service mongo pointing to pods created by StatefulSet mdb, with identities mdb-1, mdb-2, mdb-3
 dig mongodb.namespace.svc.cluster.local +short A
 172.130.16.50
 dig mdb-1.mongodb.namespace.svc.cluster.local +short A
 # IP of pod created for mdb-1
 dig mdb-2.mongodb.namespace.svc.cluster.local +short A
 # IP of pod created for mdb-2
 dig mdb-3.mongodb.namespace.svc.cluster.local +short A
 # IP of pod created for mdb-3
 ```
 This is currently implemented via an annotation on pods, which is surfaced to endpoints, and finally
 surfaced as DNS on the service that exposes those pods.
 ```
 // The pods created by this StatefulSet will have the DNS names "mysql-0.NAMESPACE.svc.cluster.local"
 // and "mysql-1.NAMESPACE.svc.cluster.local"
 kind: StatefulSet
 metadata:
  name: mysql
 spec:
  replicas: 2
  serviceName: db
  template:
    spec:
      containers:
      - image: mysql:latest
 // Example pod created by stateful set
 kind: Pod
 metadata:
  name: mysql-0
  annotations:
    pod.beta.kubernetes.io/hostname: "mysql-0"
    pod.beta.kubernetes.io/subdomain: db
 spec:
  ...
 ```
 ### Preventing duplicate identities
 The StatefulSet controller is expected to execute like other controllers, as a single writer.  However, when
 considering designing for safety first, the possibility of the controller running concurrently cannot
 be overlooked, and so it is important to ensure that duplicate pod identities are not achieved.
 There are two mechanisms to acheive this at the current time. One is to leverage unique names for pods
 that carry the identity of the pod - this prevents duplication because etcd 2 can guarantee single
 key transactionality. The other is to use the status field of the StatefulSet to coordinate membership
 information. It is possible to leverage both at this time, and encourage users to not assume pod
 name is significant, but users are likely to take what they can get. A downside of using unique names
 is that it complicates pre-warming of pods and pod migration - on the other hand, those are also
 advanced use cases that might be better solved by another, more specialized controller (a
 MigratableStatefulSet).
 ### Managing lifecycle of members
 The most difficult aspect of managing a member set is ensuring that all members see a consistent configuration
 state of the set. Without a strongly consistent view of cluster state, most clustered software is
 vulnerable to split brain. For example, a new set is created with 3 members. If the node containing the
 first member is partitioned from the cluster, it may not observe the other two members, and thus create its
 own cluster of size 1. The other two members do see the first member, so they form a cluster of size 3.
 Both clusters appear to have quorum, which can lead to data loss if not detected.
 StatefulSets should provide basic mechanisms that enable a consistent view of cluster state to be possible,
 and in the future provide more tools to reduce the amount of work necessary to monitor and update that
 state.
 The first mechanism is that the StatefulSet controller blocks creation of new pods until all previous pods
 are reporting a healthy status. The StatefulSet controller uses the strong serializability of the underyling
 etcd storage to ensure that it acts on a consistent view of the cluster membership (the pods and their)
 status, and serializes the creation of pods based on the health state of other pods. This simplifies
 reasoning about how to initialize a StatefulSet, but is not sufficient to guarantee split brain does not
 occur.
 The second mechanism is having each "member" use the state of the cluster and transform that into cluster
 configuration or decisions about membership. This is currently implemented using a side car container
 that watches the master (via DNS today, although in the future this may be to endpoints directly) to
 receive an ordered history of events, and then applying those safely to the configuration. Note that
 for this to be safe, the history received must be strongly consistent (must be the same order of
 events from all observers) and the config change must be bounded (an old config version may not
 be allowed to exist forever). For now, this is known as a 'babysitter' (working name) and is intended
 to help identify abstractions that can be provided by the StatefulSet controller in the future.
 ## Future Evolution
 Criteria for advancing to beta:
 * StatefulSets do not accidentally lose data due to cluster design - the pod safety proposal will
  help ensure StatefulSets can guarantee **at most one** instance of a pod identity is running at
  any time.
 * A design consensus is reached on StatefulSet upgrades.
 Criteria for advancing to GA:
 * StatefulSets solve 80% of clustered software configuraton with minimal input from users and are safe from common split brain problems
  * Several representative examples of StatefulSets from the community have been proven/tested to be "correct" for a variety of partition problems (possibly via Jepsen or similar)
  * Sufficient testing and soak time has been in place (like for Deployments) to ensure the necessary features are in place.
 * StatefulSets are considered easy to use for deploying clustered software for common cases
 Requested features:
 * IPs per member for clustered software like Cassandra that cache resolved DNS addresses that can be used outside the cluster
  * Individual services can potentially be used to solve this in some cases.
 * Send more / simpler events to each pod from a central spot via the "signal API"
 * Persistent local volumes that can leverage local storage
 * Allow pods within the StatefulSet to identify "leader" in a way that can direct requests from a service to a particular member.
 * Provide upgrades of a StatefulSet in a controllable way (like Deployments).
 ## Overlap with other proposals
 * Jobs can be used to perform a run-once initialization of the cluster
 * Init containers can be used to prime PVs and config with the identity of the pod.
 * Templates and how fields are overriden in the resulting object should have broad alignment
 * DaemonSet defines the core model for how new controllers sit alongside replication controller and
  how upgrades can be implemented outside of Deployment objects.
 ## History
 StatefulSets were formerly known as PetSets and were renamed to be less "cutesy" and more descriptive as a
 prerequisite to moving to beta. No animals were harmed in the making of this proposal.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/stateful-apps.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/synchronous-garbage-collection.md
+++ b/docs/proposals/synchronous-garbage-collection.md
@ -1,175 +1 @@
-**Table of Contents**
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/synchronous-garbage-collection.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/synchronous-garbage-collection.md)
 <!-- BEGIN MUNGE: GENERATED_TOC -->
 - [Overview](#overview)
 - [API Design](#api-design)
  - [Standard Finalizers](#standard-finalizers)
  - [OwnerReference](#ownerreference)
  - [DeleteOptions](#deleteoptions)
 - [Components changes](#components-changes)
  - [API Server](#api-server)
  - [Garbage Collector](#garbage-collector)
  - [Controllers](#controllers)
 - [Handling circular dependencies](#handling-circular-dependencies)
 - [Unhandled cases](#unhandled-cases)
 - [Implications to existing clients](#implications-to-existing-clients)
 <!-- END MUNGE: GENERATED_TOC -->
 # Overview
 Users of the server-side garbage collection need to determine if the garbage collection is done. For example:
 * Currently `kubectl delete rc` blocks until all the pods are terminating. To convert to use server-side garbage collection, kubectl has to be able to determine if the garbage collection is done.
 * [#19701](https://github.com/kubernetes/kubernetes/issues/19701#issuecomment-236997077) is a use case where the user needs to wait for all service dependencies garbage collected and their names released, before she recreates the dependencies.
 We define the garbage collection as "done" when all the dependents are deleted from the key-value store, rather than merely in the terminating state. There are two reasons: *i)* for `Pod`s, the most usual garbage, only when they are deleted from the key-value store, we know kubelet has released resources they occupy; *ii)* some users need to recreate objects with the same names, they need to wait for the old objects to be deleted from the key-value store. (This limitation is because we index objects by their names in the key-value store today.)
 Synchronous Garbage Collection is a best-effort (see [unhandled cases](#unhandled-cases)) mechanism that allows user to determine if the garbage collection is done: after the API server receives a deletion request of an owning object, the object keeps existing in the key-value store until all its dependents are deleted from the key-value store by the garbage collector.
 Tracking issue: https://github.com/kubernetes/kubernetes/issues/29891
 # API Design
 ## Standard Finalizers
 We will introduce a new standard finalizer:
 ```go
 const GCFinalizer string = “DeletingDependents”
 ```
 This finalizer indicates the object is terminating and is waiting for its dependents whose `OwnerReference.BlockOwnerDeletion` is true get deleted.
 ## OwnerReference
 ```go
 OwnerReference {
     ...
     // If true, AND if the owner has the "DeletingDependents" finalizer, then the owner cannot be deleted from the key-value store until this reference is removed.
     // Defaults to false.
     // To set this field, a user needs "delete" permission of the owner, otherwise 422 (Unprocessable Entity) will be returned.
     BlockOwnerDeletion *bool
 }
 ```
 The initial draft of the proposal did not include this field and it had a security loophole: a user who is only authorized to update one resource can set ownerReference to block the synchronous GC of other resources. Requiring users to explicitly set `BlockOwnerDeletion` allows the master to properly authorize the request.
 ## DeleteOptions
 ```go
 DeleteOptions {
  …
  // Whether and how garbage collection will be performed.
  // Defaults to DeletePropagationDefault
  // Either this field or OrphanDependents may be set, but not both.
  PropagationPolicy *DeletePropagationPolicy
 }
 type DeletePropagationPolicy string
 const (
    // The default depends on the existing finalizers on the object and the type of the object.
    DeletePropagationDefault DeletePropagationPolicy = "DeletePropagationDefault"
    // Orphans the dependents
    DeletePropagationOrphan DeletePropagationPolicy = "DeletePropagationOrphan"
    // Deletes the object from the key-value store, the garbage collector will delete the dependents in the background.
    DeletePropagationBackground DeletePropagationPolicy = "DeletePropagationBackground"
    // The object exists in the key-value store until the garbage collector deletes all the dependents whose ownerReference.blockOwnerDeletion=true from the key-value store.
    // API sever will put the "DeletingDependents" finalizer on the object, and sets its deletionTimestamp.
    // This policy is cascading, i.e., the dependents will be deleted with GarbageCollectionSynchronous.
    DeletePropagationForeground DeletePropagationPolicy = "DeletePropagationForeground"
 )
 ```
 The `DeletePropagationForeground` policy represents the synchronous GC mode.
 `DeleteOptions.OrphanDependents *bool` will be marked as deprecated and will be removed in 1.7. Validation code will make sure only one of `OrphanDependents` and `PropagationPolicy` may be set. We decided not to add another `DeleteAfterDependentsDeleted *bool`, because together with `OrphanDependents`, it will result in 9 possible combinations and is thus confusing.
 The conversion rules are described in the following table:
 | 1.5                                      | pre 1.4/1.4              |
 |------------------------------------------|--------------------------|
 | DeletePropagationDefault                 | OrphanDependents==nil    |
 | DeletePropagationOrphan                  | *OrphanDependents==true  |
 | DeletePropagationBackground              | *OrphanDependents==false |
 | DeletePropagationForeground              | N/A                      |
 # Components changes
 ## API Server
 `Delete()` function checks `DeleteOptions.PropagationPolicy`. If the policy is `DeletePropagationForeground`, the API server will update the object instead of deleting it, add the "DeletingDependents" finalizer, remove the "OrphanDependents" finalizer if it's present, and set the `ObjectMeta.DeletionTimestamp`.
 When validating the ownerReference, API server needs to query the `Authorizer` to check if the user has "delete" permission of the owner object. It returns 422 if the user does not have the permissions but intends to set `OwnerReference.BlockOwnerDeletion` to true.
 ## Garbage Collector
 **Modifications to processEvent()**
 Currently `processEvent()` manages GC’s internal owner-dependency relationship graph, `uidToNode`. It updates `uidToNode` according to the Add/Update/Delete events in the cluster. To support synchronous GC, it has to:
 * handle Add or Update events where `obj.Finalizers.Has(GCFinalizer) && obj.DeletionTimestamp != nil`. The object will be added into the `dirtyQueue`. The object will be marked as “GC in progress” in `uidToNode`.
 * Upon receiving the deletion event of an object, put its owner into the `dirtyQueue` if the owner node is marked as "GC in progress". This is to force the `processItem()` (described next) to re-check if all dependents of the owner is deleted.
 **Modifications to processItem()**
 Currently `processItem()` consumes the `dirtyQueue`, requests the API server to delete an item if all of its owners do not exist. To support synchronous GC, it has to:
 * treat an owner as "not exist" if `owner.DeletionTimestamp != nil && !owner.Finalizers.Has(OrphanFinalizer)`, otherwise synchronous GC will not progress because the owner keeps existing in the key-value store.
 * when deleting dependents, if the owner's finalizers include `DeletingDependents`, it should use the `GarbageCollectionSynchronous` as GC policy.
 * if an object has multiple owners, some owners still exist while other owners are in the synchronous GC stage, then according to the existing logic of GC, the object wouldn't be deleted. To unblock the synchronous GC of owners, `processItem()` has to remove the ownerReferences pointing to them.
 In addition, if an object popped from `dirtyQueue` is marked as "GC in progress", `processItem()` treats it specially:
 * To avoid racing with another controller, it requeues the object if `observedGeneration < Generation`. This is best-effort, see [unhandled cases](#unhandled-cases).
 * Checks if the object has dependents
  * If not, send a PUT request to remove the `GCFinalizer`;
  * If so, then add all dependents to the `dirtryQueue`; we need bookkeeping to avoid adding the dependents repeatedly if the owner gets in the `synchronousGC queue` multiple times.
 ## Controllers
 To utilize the synchronous garbage collection feature, controllers (e.g., the replicaset controller) need to set `OwnerReference.BlockOwnerDeletion` when creating dependent objects (e.g. pods).
 # Handling circular dependencies
 SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily breaking circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it removes the `GCFinalizer` from the object.
 Note that the approach is not rigorous and thus having false positives. For example, if a user first sends a SynchronousGC delete request for an object, then sends the delete request for its owner, then `processItem()` will be fooled to believe there is a circle. We expect user not to do this. We can make the circle detection more rigorous if needed.
 Circular dependencies are regarded as user error. If needed, we can add more guarantees to handle such cases later.
 # Unhandled cases
 * If the GC observes the owning object with the `GCFinalizer` before it observes the creation of all the dependents, GC will remove the finalizer from the owning object before all dependents are gone. Hence, synchronous GC is best-effort, though we guarantee that the dependents will be deleted eventually. We face a similar case when handling OrphanFinalizer, see [GC known issues](https://github.com/kubernetes/kubernetes/issues/26120).
 # Implications to existing clients
 Finalizer breaks an assumption that many Kubernetes components have: a deletion request with `grace period=0` will immediately remove the object from the key-value store. This is not true if an object has pending finalizers, the object will continue to exist, and currently the API server will not return an error in this case.
 **Namespace controller** suffered from this [problem](https://github.com/kubernetes/kubernetes/issues/32519) and was fixed in [#32524](https://github.com/kubernetes/kubernetes/pull/32524) by retrying every 15s if there are objects with pending finalizers to be removed from the key-value store. Object with pending `GCFinalizer` might take arbitrary long time be deleted, so namespace deletion might time out.
 **kubelet** deletes the pod from the key-value store after all its containers are terminated ([code](../../pkg/kubelet/status/status_manager.go#L441-L443)). It also assumes that if the API server does not return an error, the pod is removed from the key-value store. Breaking the assumption will not break `kubelet` though, because the `pod` must have already been in the terminated phase, `kubelet` will not care to manage it.
 **Node controller** forcefully deletes pod if the pod is scheduled to a node that does not exist ([code](../../pkg/controller/node/nodecontroller.go#L474)). The pod will continue to exist if it has pending finalizers. The node controller will futilely retry the deletion. Also, the `node controller` forcefully deletes pods before deleting the node ([code](../../pkg/controller/node/nodecontroller.go#L592)). If the pods have pending finalizers, the `node controller` will go ahead deleting the node, leaving those pods behind. These pods will be deleted from the key-value store when the pending finalizers are removed.
 **Podgc** deletes terminated pods if there are too many of them in the cluster. We need to make sure finalizers on Pods are taken off quickly enough so that the progress of `Podgc` is not affected.
 **Deployment controller** adopts existing `ReplicaSet` (RS) if its template matches. If a matching RS has a pending `GCFinalizer`, deployment should adopt it, take its pods into account, but shouldn't try to mutate it, because the RS controller will ignore a RS that's being deleted. Hence, `deployment controller` should wait for the RS to be deleted, and then create a new one.
 **Replication controller manager**, **Job controller**, and **ReplicaSet controller** ignore pods in terminated phase, so pods with pending finalizers will not block these controllers.
 **StatefulSet controller** will be blocked by a pod with pending finalizers, so synchronous GC might slow down its progress.
 **kubectl**: synchronous GC can simplify the **kubectl delete** reapers. Let's take the `deployment reaper` as an example, since it's the most complicated one. Currently, the reaper finds all `RS` with matching labels, scales them down, polls until `RS.Status.Replica` reaches 0, deletes the `RS`es, and finally deletes the `deployment`. If using synchronous GC, `kubectl delete deployment` is as easy as sending a synchronous GC delete request for the deployment, and polls until the deployment is deleted from the key-value store.
 Note that this **changes the behavior** of `kubectl delete`. The command will be blocked until all pods are deleted from the key-value store, instead of being blocked until pods are in the terminating state. This means `kubectl delete` blocks for longer time, but it has the benefit that the resources used by the pods are released when the `kubectl delete` returns. To allow kubectl user not waiting for the cleanup, we will add a `--wait` flag. It defaults to true; if it's set to `false`, `kubectl delete` will send the delete request with `PropagationPolicy=DeletePropagationBackground` and return immediately.
 To make the new kubectl compatible with the 1.4 and earlier masters, kubectl needs to switch to use the old reaper logic if it finds synchronous GC is not supported by the master.
 1.4 `kubectl delete rc/rs` uses `DeleteOptions.OrphanDependents=true`, which is going to be converted to `DeletePropagationBackground` (see [API Design](#api-changes)) by a 1.5 master, so its behavior keeps the same.
 Pre 1.4 `kubectl delete` uses `DeleteOptions.OrphanDependents=nil`, so does the 1.4 `kubectl delete` for resources other than rc and rs. The option is going to be converted to `DeletePropagationDefault` (see [API Design](#api-changes)) by a 1.5 master, so these commands behave the same as when working with a 1.4 master.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/synchronous-garbage-collection.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/templates.md
+++ b/docs/proposals/templates.md
@ -1,569 +1 @@
-# Templates+Parameterization: Repeatedly instantiating user-customized application topologies.
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/templates.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/templates.md)
 ## Motivation
 Addresses https://github.com/kubernetes/kubernetes/issues/11492
 There are two main motivators for Template functionality in Kubernetes:  Controller Instantiation and Application Definition
 ### Controller Instantiation
 Today the replication controller defines a PodTemplate which allows it to instantiate multiple pods with identical characteristics.
 This is useful but limited.  Stateful applications have a need to instantiate multiple instances of a more sophisticated topology
 than just a single pod (e.g. they also need Volume definitions).  A Template concept would allow a Controller to stamp out multiple
 instances of a given Template definition.  This capability would be immediately useful to the [StatefulSet](https://github.com/kubernetes/kubernetes/pull/18016) proposal.
 Similarly the [Service Catalog proposal](https://github.com/kubernetes/kubernetes/pull/17543) could leverage template instantiation as a mechanism for claiming service instances.
 ### Application Definition
 Kubernetes gives developers a platform on which to run images and many configuration objects to control those images, but
 constructing a cohesive application made up of images and configuration objects is currently difficult.  Applications
 require:
 * Information sharing between images (e.g. one image provides a DB service, another consumes it)
 * Configuration/tuning settings (memory sizes, queue limits)
 * Unique/customizable identifiers (service names, routes)
 Application authors know which values should be tunable and what information must be shared, but there is currently no
 consistent way for an application author to define that set of information so that application consumers can easily deploy
 an application and make appropriate decisions about the tunable parameters the author intended to expose.
 Furthermore, even if an application author provides consumers with a set of API object definitions (e.g. a set of yaml files)
 it is difficult to build a UI around those objects that would allow the deployer to modify names in one place without
 potentially breaking assumed linkages to other pieces.  There is also no prescriptive way to define which configuration
 values are appropriate for a deployer to tune or what the parameters control.
 ## Use Cases
 ### Use cases for templates in general
 * Providing a full baked application experience in a single portable object that can be repeatably deployed in different environments.
  * e.g. Wordpress deployment with separate database pod/replica controller
  * Complex service/replication controller/volume topologies
 * Bulk object creation
 * Provide a management mechanism for deleting/uninstalling an entire set of components related to a single deployed application
 * Providing a library of predefined application definitions that users can select from
 * Enabling the creation of user interfaces that can guide an application deployer through the deployment process with descriptive help about the configuration value decisions they are making, and useful default values where appropriate
 * Exporting a set of objects in a namespace as a template so the topology can be inspected/visualized or recreated in another environment
 * Controllers that need to instantiate multiple instances of identical objects (e.g. StatefulSets).
 ### Use cases for parameters within templates
 * Share passwords between components (parameter value is provided to each component as an environment variable or as a Secret reference, with the Secret value being parameterized or produced by an [initializer](https://github.com/kubernetes/kubernetes/issues/3585))
 * Allow for simple deployment-time customization of “app” configuration via environment values or api objects, e.g. memory
  tuning parameters to a MySQL image, Docker image registry prefix for image strings, pod resource requests and limits, default
  scale size.
 * Allow simple, declarative defaulting of parameter values and expose them to end users in an approachable way - a parameter
  like “MySQL table space” can be parameterized in images as an env var - the template parameters declare the parameter, give
  it a friendly name, give it a reasonable default, and informs the user what tuning options are available.
 * Customization of component names to avoid collisions and ensure matched labeling (e.g. replica selector value and pod label are
  user provided and in sync).
 * Customize cross-component references (e.g. user provides the name of a secret that already exists in their namespace, to use in
  a pod as a TLS cert).
 * Provide guidance to users for parameters such as default values, descriptions, and whether or not a particular parameter value
  is required or can be left blank.
 * Parameterize the replica count of a deployment or [StatefulSet](https://github.com/kubernetes/kubernetes/pull/18016)
 * Parameterize part of the labels and selector for a DaemonSet
 * Parameterize quota/limit values for a pod
 * Parameterize a secret value so a user can provide a custom password or other secret at deployment time
 ## Design Assumptions
 The goal for this proposal is a simple schema which addresses a few basic challenges:
 * Allow application authors to expose configuration knobs for application deployers, with suggested defaults and
 descriptions of the purpose of each knob
 * Allow application deployers to easily customize exposed values like object names while maintaining referential integrity
  between dependent pieces (for example ensuring a pod's labels always match the corresponding selector definition of the service)
 * Support maintaining a library of templates within Kubernetes that can be accessed and instantiated by end users
 * Allow users to quickly and repeatedly deploy instances of well-defined application patterns produced by the community
 * Follow established Kubernetes API patterns by defining new template related APIs which consume+return first class Kubernetes
  API (and therefore json conformant) objects.
 We do not wish to invent a new Turing-complete templating language.  There are good options available
 (e.g. https://github.com/mustache/mustache) for developers who want a completely flexible and powerful solution for creating
 arbitrarily complex templates with parameters, and tooling can be built around such schemes.
 This desire for simplicity also intentionally excludes template composability/embedding as a supported use case.
 Allowing templates to reference other templates presents versioning+consistency challenges along with making the template
 no longer a self-contained portable object.  Scenarios necessitating multiple templates can be handled in one of several
 alternate ways:
 * Explicitly constructing a new template that merges the existing templates (tooling can easily be constructed to perform this
  operation since the templates are first class api objects).
 * Manually instantiating each template and utilizing [service linking](https://github.com/kubernetes/kubernetes/pull/17543) to share
  any necessary configuration data.
 This document will also refrain from proposing server APIs or client implementations.  This has been a point of debate, and it makes
 more sense to focus on the template/parameter specification/syntax than to worry about the tooling that will process or manage the
 template objects.  However since there is a desire to at least be able to support a server side implementation, this proposal
 does assume the specification will be k8s API friendly.
 ## Desired characteristics
 * Fully k8s object json-compliant syntax.  This allows server side apis that align with existing k8s apis to be constructed
  which consume templates and existing k8s tooling to work with them.  It also allows for api versioning/migration to be managed by
  the existing k8s codec scheme rather than having to define/introduce a new syntax evolution mechanism.
  * (Even if they are not part of the k8s core, it would still be good if a server side template processing+managing api supplied
    as an ApiGroup consumed the same k8s object schema as the peer k8s apis rather than introducing a new one)
 * Self-contained parameter definitions.  This allows a template to be a portable object which includes metadata that describe
  the inputs it expects, making it easy to wrapper a user interface around the parameterization flow.
 * Object field primitive types include string, int, boolean, byte[].  The substitution scheme should support all of those types.
  * complex types (struct/map/list) can be defined in terms of the available primitives, so it's preferred to avoid the complexity
    of allowing for full complex-type substitution.
 * Parameter metadata.  Parameters should include at a minimum, information describing the purpose of the parameter, whether it is
  required/optional, and a default/suggested value.  Type information could also be required to enable more intelligent client interfaces.
 * Template metadata.  Templates should be able to include metadata describing their purpose or links to further documentation and
  versioning information.  Annotations on the Template's metadata field can fulfill this requirement.
 ## Proposed Implementation
 ### Overview
 We began by looking at the List object which allows a user to easily group a set of objects together for easy creation via a
 single CLI invocation.  It also provides a portable format which requires only a single file to represent an application.
 From that starting point, we propose a Template API object which can encapsulate the definition of all components of an
 application to be created.  The application definition is encapsulated in the form of an array of API objects (identical to
 List), plus a parameterization section.  Components reference the parameter by name and the value of the parameter is
 substituted during a processing step, prior to submitting each component to the appropriate API endpoint for creation.
 The primary capability provided is that parameter values can easily be shared between components, such as a database password
 that is provided by the user once, but then attached as an environment variable to both a database pod and a web frontend pod.
 In addition, the template can be repeatedly instantiated for a consistent application deployment experience in different
 namespaces or Kubernetes clusters.
 Lastly, we propose the Template API object include a “Labels” section in which the template author can define a set of labels
 to be applied to all objects created from the template.  This will give the template deployer an easy way to manage all the
 components created from a given template.  These labels will also be applied to selectors defined by Objects within the template,
 allowing a combination of templates and labels to be used to scope resources within a namespace.  That is, a given template
 can be instantiated multiple times within the same namespace, as long as a different label value is used each for each
 instantiation.  The resulting objects will be independent from a replica/load-balancing perspective.
 Generation of parameter values for fields such as Secrets will be delegated to an [admission controller/initializer/finalizer](https://github.com/kubernetes/kubernetes/issues/3585) rather than being solved by the template processor.  Some discussion about a generation
 service is occurring [here](https://github.com/kubernetes/kubernetes/issues/12732)
 Labels to be assigned to all objects could also be generated in addition to, or instead of, allowing labels to be supplied in the
 Template definition.
 ### API Objects
 **Template Object**
 ```
 // Template contains the inputs needed to produce a Config.
 type Template struct {
    unversioned.TypeMeta
    kapi.ObjectMeta
    // Optional: Parameters is an array of Parameters used during the
    // Template to Config transformation.
    Parameters []Parameter
    // Required: A list of resources to create
    Objects []runtime.Object
    // Optional: ObjectLabels is a set of labels that are applied to every
    // object during the Template to Config transformation
    // These labels are also be applied to selectors defined by objects in the template
    ObjectLabels map[string]string
 }
 ```
 **Parameter Object**
 ```
 // Parameter defines a name/value variable that is to be processed during
 // the Template to Config transformation.
 type Parameter struct {
    // Required: Parameter name must be set and it can be referenced in Template
    // Items using $(PARAMETER_NAME)
    Name string
    // Optional: The name that will show in UI instead of parameter 'Name'
    DisplayName string
    // Optional: Parameter can have description
    Description string
    // Optional: Value holds the Parameter data.
    // The value replaces all occurrences of the Parameter $(Name) or 
    // $((Name)) expression during the Template to Config transformation.
    Value string
    // Optional: Indicates the parameter must have a non-empty value either provided by the user or provided by a default.  Defaults to false.
    Required bool
    // Optional: Type-value of the parameter (one of string, int, bool, or base64)
    // Used by clients to provide validation of user input and guide users.
    Type ParameterType
 }
 ```
 As seen above, parameters allow for metadata which can be fed into client implementations to display information about the
 parameter’s purpose and whether a value is required.  In lieu of type information, two reference styles are offered:  `$(PARAM)`
 and `$((PARAM))`.  When the single parens option is used, the result of the substitution will remain quoted.  When the double
 parens option is used, the result of the substitution will not be quoted.  For example, given a parameter defined with a value
 of "BAR", the following behavior will be observed:
 ```
 somefield: "$(FOO)"  ->  somefield: "BAR"
 somefield: "$((FOO))"  ->  somefield: BAR
 ```
 // for concatenation, the result value reflects the type of substitution (quoted or unquoted):
 ```
 somefield: "prefix_$(FOO)_suffix"  ->  somefield: "prefix_BAR_suffix"
 somefield: "prefix_$((FOO))_suffix"  ->  somefield: prefix_BAR_suffix
 ```
 // if both types of substitution exist, quoting is performed:
 ```
 somefield: "prefix_$((FOO))_$(FOO)_suffix"  ->  somefield: "prefix_BAR_BAR_suffix"
 ```
 This mechanism allows for integer/boolean values to be substituted properly.
 The value of the parameter can be explicitly defined in template.  This should be considered a default value for the parameter, clients
 which process templates are free to override this value based on user input.
 **Example Template**
 Illustration of a template which defines a service and replication controller with parameters to specialized
 the name of the top level objects, the number of replicas, and several environment variables defined on the
 pod template.
 ```
 {
  "kind": "Template",
  "apiVersion": "v1",
  "metadata": {
    "name": "mongodb-ephemeral",
    "annotations": {
      "description": "Provides a MongoDB database service"
    }
  },
  "labels": {
    "template": "mongodb-ephemeral-template"
  },
  "objects": [
    {
      "kind": "Service",
      "apiVersion": "v1",
      "metadata": {
        "name": "$(DATABASE_SERVICE_NAME)"
      },
      "spec": {
        "ports": [
          {
            "name": "mongo",
            "protocol": "TCP",
            "targetPort": 27017
          }
        ],
        "selector": {
          "name": "$(DATABASE_SERVICE_NAME)"
        }
      }
    },
    {
      "kind": "ReplicationController",
      "apiVersion": "v1",
      "metadata": {
        "name": "$(DATABASE_SERVICE_NAME)"
      },
      "spec": {
        "replicas": "$((REPLICA_COUNT))",
        "selector": {
          "name": "$(DATABASE_SERVICE_NAME)"
        },
        "template": {
          "metadata": {
              "creationTimestamp": null,
              "labels": {
                  "name": "$(DATABASE_SERVICE_NAME)"
              }
          },
          "spec": {
            "containers": [
              {
                "name": "mongodb",
                "image": "docker.io/centos/mongodb-26-centos7",
                "ports": [
                  {
                    "containerPort": 27017,
                    "protocol": "TCP"
                  }
                ],
                "env": [
                  {
                    "name": "MONGODB_USER",
                    "value": "$(MONGODB_USER)"
                  },
                  {
                    "name": "MONGODB_PASSWORD",
                    "value": "$(MONGODB_PASSWORD)"
                  },
                  {
                    "name": "MONGODB_DATABASE",
                    "value": "$(MONGODB_DATABASE)"
                  }
                ]
              }
            ]
          }
        }
      }
    }
  ],
  "parameters": [
    {
      "name": "DATABASE_SERVICE_NAME",
      "description": "Database service name",
      "value": "mongodb",
      "required": true
    },
    {
      "name": "MONGODB_USER",
      "description": "Username for MongoDB user that will be used for accessing the database",
      "value": "username",
      "required": true
    },
    {
      "name": "MONGODB_PASSWORD",
      "description": "Password for the MongoDB user",
      "required": true
    },
    {
      "name": "MONGODB_DATABASE",
      "description": "Database name",
      "value": "sampledb",
      "required": true
    },
    {
      "name": "REPLICA_COUNT",
      "description": "Number of mongo replicas to run",
      "value": "1",
      "required": true
    }
  ]
 }
 ```
 ### API Endpoints
 * **/processedtemplates** - when a template is POSTed to this endpoint, all parameters in the template are processed and
 substituted into appropriate locations in the object definitions.  Validation is performed to ensure required parameters have
 a value supplied.  In addition labels defined in the template are applied to the object definitions.  Finally the customized
 template (still a `Template` object) is returned to the caller.  (The possibility of returning a List instead has
 also been discussed and will be considered for implementation).
 The client is then responsible for iterating the objects returned and POSTing them to the appropriate resource api endpoint to
 create each object, if that is the desired end goal for the client.
 Performing parameter substitution on the server side has the benefit of centralizing the processing so that new clients of
 k8s, such as IDEs, CI systems, Web consoles, etc, do not need to reimplement template processing or embed the k8s binary.
 Instead they can invoke the k8s api directly.
 * **/templates** - the REST storage resource for storing and retrieving template objects, scoped within a namespace.
 Storing templates within k8s has the benefit of enabling template sharing and securing via the same roles/resources
 that are used to provide access control to other cluster resources.  It also enables sophisticated service catalog
 flows in which selecting a service from a catalog results in a new instantiation of that service.  (This is not the
 only way to implement such a flow, but it does provide a useful level of integration).
 Creating a new template (POST to the /templates api endpoint) simply stores the template definition, it has no side
 effects(no other objects are created).
 This resource can also support a subresource "/templates/templatename/processed".  This resource would accept just a
 Parameters object and would process the template stored in the cluster as "templatename".  The processed result would be
 returned in the same form as `/processedtemplates`
 ### Workflow
 #### Template Instantiation
 Given a well-formed template, a client will
 1. Optionally set an explicit `value` for any parameter values the user wishes to explicitly set
 2. Submit the new template object to the `/processedtemplates` api endpoint
 The api endpoint will then:
 1. Validate the template including confirming “required” parameters have an explicit value.
 2. Walk each api object in the template.
 3. Adding all labels defined in the template’s ObjectLabels field.
 4. For each field, check if the value matches a parameter name and if so, set the value of the field to the value of the parameter.
  * Partial substitutions are accepted, such as `SOME_$(PARAM)` which would be transformed into `SOME_XXXX` where `XXXX` is the value
    of the `$(PARAM)` parameter.
  * If a given $(VAL) could be resolved to either a parameter or an environment variable/downward api reference, an error will be
    returned.
 5. Return the processed template object. (or List, depending on the choice made when this is implemented)
 The client can now either return the processed template to the user in a desired form (e.g. json or yaml), or directly iterate the
 api objects within the template, invoking the appropriate object creation api endpoint for each element.  (If the api returns
 a List, the client would simply iterate the list to create the objects).
 The result is a consistently recreatable application configuration, including well-defined labels for grouping objects created by
 the template, with end-user customizations as enabled by the template author.
 #### Template Authoring
 To aid application authors in the creation of new templates, it should be possible to export existing objects from a project
 in template form.  A user should be able to export all or a filtered subset of objects from a namespace, wrappered into a
 Template API object.  The user will still need to customize the resulting object to enable parameterization and labeling,
 though sophisticated export logic could attempt to auto-parameterize well understood api fields.  Such logic is not considered
 in this proposal.
 #### Tooling
 As described above, templates can be instantiated by posting them to a template processing endpoint.  CLI tools should
 exist which can input parameter values from the user as part of the template instantiation flow.
 More sophisticated UI implementations should also guide the user through which parameters the template expects, the description
 of those templates, and the collection of user provided values.
 In addition, as described above, existing objects in a namespace can be exported in template form, making it easy to recreate a
 set of objects in a new namespace or a new cluster.
 ## Examples
 ### Example Templates
 These examples reflect the current OpenShift template schema, not the exact schema proposed in this document, however this
 proposal, if accepted, provides sufficient capability to support the examples defined here, with the exception of
 automatic generation of passwords.
 * [Jenkins template](https://github.com/openshift/origin/blob/master/examples/jenkins/jenkins-persistent-template.json)
 * [MySQL DB service template](https://github.com/openshift/origin/blob/master/examples/db-templates/mysql-persistent-template.json)
 ### Examples of OpenShift Parameter Usage
 (mapped to use cases described above)
 * [Share passwords](https://github.com/jboss-openshift/application-templates/blob/master/eap/eap64-mongodb-s2i.json#L146-L152)
 * [Simple deployment-time customization of “app” configuration via environment values](https://github.com/jboss-openshift/application-templates/blob/master/eap/eap64-mongodb-s2i.json#L108-L126) (e.g. memory tuning, resource limits, etc)
 * [Customization of component names with referential integrity](https://github.com/jboss-openshift/application-templates/blob/master/eap/eap64-mongodb-s2i.json#L199-L207)
 * [Customize cross-component references](https://github.com/jboss-openshift/application-templates/blob/master/eap/eap64-mongodb-s2i.json#L78-L83) (e.g. user provides the name of a secret that already exists in their namespace, to use in a pod as a TLS cert)
 ## Requirements analysis
 There has been some discussion of desired goals for a templating/parameterization solution [here](https://github.com/kubernetes/kubernetes/issues/11492#issuecomment-160853594).  This section will attempt to address each of those points.
 *The primary goal is that parameterization should facilitate reuse of declarative configuration templates in different environments in
  a "significant number" of common cases without further expansion, substitution, or other static preprocessing.*
 * This solution provides for templates that can be reused as is (assuming parameters are not used or provide sane default values) across
  different environments, they are a self-contained description of a topology.
 *Parameterization should not impede the ability to use kubectl commands with concrete resource specifications.*
 * The parameterization proposal here does not extend beyond Template objects.  That is both a strength and limitation of this proposal.
  Parameterizable objects must be wrapped into a Template object, rather than existing on their own.
 *Parameterization should work with all kubectl commands that accept --filename, and should work on templates comprised of multiple resources.*
 * Same as above.
 *The parameterization mechanism should not prevent the ability to wrap kubectl with workflow/orchestration tools, such as Deployment manager.*
 * Since this proposal uses standard API objects, a DM or Helm flow could still be constructed around a set of templates, just as those flows are
  constructed around other API objects today.
 *Any parameterization mechanism we add should not preclude the use of a different parameterization mechanism, it should be possible
 to use different mechanisms for different resources, and, ideally, the transformation should be composable with other
 substitution/decoration passes.*
 * This templating scheme does not preclude layering an additional templating mechanism over top of it.  For example, it would be
  possible to write a Mustache template which, after Mustache processing, resulted in a Template which could then be instantiated
  through the normal template instantiating process.
 *Parameterization should not compromise reproducibility. For instance, it should be possible to manage template arguments as well as
 templates under version control.*
 * Templates are a single file, including default or chosen values for parameters.  They can easily be managed under version control.
 *It should be possible to specify template arguments (i.e., parameter values) declaratively, in a way that is "self-describing"
 (i.e., naming the parameters and the template to which they correspond). It should be possible to write generic commands to
 process templates.*
 * Parameter definitions include metadata which describes the purpose of the parameter.  Since parameter definitions are part of the template,
  there is no need to indicate which template they correspond to.
 *It should be possible to validate templates and template parameters, both values and the schema.*
 * Template objects are subject to standard api validation.
 *It should also be possible to validate and view the output of the substitution process.*
 * The `/processedtemplates` api returns the result of the substitution process, which is itself a Template object that can be validated.
 *It should be possible to generate forms for parameterized templates, as discussed in #4210 and #6487.*
 * Parameter definitions provide metadata that allows for the construction of form-based UIs to gather parameter values from users.
 *It shouldn't be inordinately difficult to evolve templates. Thus, strategies such as versioning and encapsulation should be
 encouraged, at least by convention.*
 * Templates can be versioned via annotations on the template object.
 ## Key discussion points
 The preceding document is opinionated about each of these topics, however they have been popular topics of discussion so they are called out explicitly below.
 ### Where to define parameters
 There has been some discussion around where to define parameters that are being injected into a Template
 1. In a separate standalone file
 2. Within the Template itself
 This proposal suggests including the parameter definitions within the Template, which provides a self-contained structure that
 can be easily versioned, transported, and instantiated without risk of mismatching content.  In addition, a Template can easily
 be validated to confirm that all parameter references are resolveable.
 Separating the parameter definitions makes for a more complex process with respect to
 * Editing a template (if/when first class editing tools are created)
 * Storing/retrieving template objects with a central store
 Note that the `/templates/sometemplate/processed` subresource would accept a standalone set of parameters to be applied to `sometemplate`.
 ### How to define parameters
 There has also been debate about how a parameter should be referenced from within a template.  This proposal suggests that
 fields to be substituted by a parameter value use the "$(parameter)" syntax which is already used elsewhere within k8s.  The
 value of `parameter` should be matched to a parameter with that name, and the value of the matched parameter substituted into
 the field value.
 Other suggestions include a path/map approach in which a list of field paths (e.g. json path expressions) and corresponding
 parameter names are provided.  The substitution process would walk the map, replacing fields with the appropriate
 parameter value.  This approach makes templates more fragile from the perspective of editing/refactoring as field paths
 may change, thus breaking the map.  There is of course also risk of breaking references with the previous scheme, but
 renaming parameters seems less likely than changing field paths.
 ### Storing templates in k8s
 Openshift defines templates as a first class resource so they can be created/retrieved/etc via standard tools.  This allows client tools to list available templates (available in the openshift cluster), allows existing resource security controls to be applied to templates, and generally provides a more integrated feel to templates.  However there is no explicit requirement that for k8s to adopt templates, it must also adopt storing them in the cluster.
 ### Processing templates (server vs. client)
 Openshift handles template processing via a server endpoint which consumes a template object from the client and returns the list of objects
 produced by processing the template.  It is also possible to handle the entire template processing flow via the client, but this was deemed
 undesirable as it would force each client tool to reimplement template processing (e.g. the standard CLI tool, an eclipse plugin, a plugin for a CI system like Jenkins, etc).  The assumption in this proposal is that server side template processing is the preferred implementation approach for
 this reason.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/templates.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/volume-hostpath-qualifiers.md
+++ b/docs/proposals/volume-hostpath-qualifiers.md
@ -1,150 +1 @@
-# Support HostPath volume existence qualifiers
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md)
 ## Introduction
 A Host volume source is probably the simplest volume type to define, needing
 only a single path. However, that simplicity comes with many assumptions and
 caveats.
 This proposal describes one of the issues associated with Host volumes &mdash;
 their silent and implicit creation of directories on the host &mdash; and
 proposes a solution.
 ## Problem
 Right now, under Docker, when a bindmount references a hostPath, that path will
 be created as an empty directory, owned by root, if it does not already exist.
 This is rarely what the user actually wants because hostPath volumes are
 typically used to express a dependency on an existing external file or
 directory.
 This concern was raised during the [initial
 implementation](https://github.com/docker/docker/issues/1279#issuecomment-22965058)
 of this behavior in Docker and it was suggested that orchestration systems
 could better manage volume creation than Docker, but Docker does so as well
 anyways.
 To fix this problem, I propose allowing a pod to specify whether a given
 hostPath should exist prior to the pod running, whether it should be created,
 and what it should exist as.
 I also propose the inclusion of a default value which matches the current
 behavior to ensure backwards compatibility.
 To understand exactly when this behavior will or won't be correct, it's
 important to look at the use-cases of Host Volumes.
 The table below broadly classifies the use-case of Host Volumes and asserts
 whether this change would be of benefit to that use-case.
 ### HostPath volume Use-cases
 | Use-case | Description | Examples | Benefits from this change? | Why? |
 |:---------|:------------|:---------|:--------------------------:|:-----|
 | Accessing an external system, data, or configuration | Data or a unix socket is created by a process on the host, and a pod within kubernetes consumes it | [fluentd-es-addon](https://github.com/kubernetes/kubernetes/blob/74b01041cc3feb2bb731cc243ab0e4515bef9a84/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml#L30), [addon-manager](https://github.com/kubernetes/kubernetes/blob/808f3ecbe673b4127627a457dc77266ede49905d/cluster/gce/coreos/kube-manifests/kube-addon-manager.yaml#L23), [kube-proxy](https://github.com/kubernetes/kubernetes/blob/010c976ce8dd92904a7609483c8e794fd8e94d4e/cluster/saltbase/salt/kube-proxy/kube-proxy.manifest#L65), etc | :white_check_mark: | Fails faster and with more useful messages, and won't run when basic assumptions are false (e.g. that docker is the runtime and the docker.sock exists) |
 | Providing data to external systems | Some pods wish to publish data to the host for other systems to consume, sometimes to a generic directory and sometimes to more component-specific ones | Kubelet core components which bindmount their logs out to `/var/log/*.log` so logrotate and other tools work with them | :white_check_mark: | Sometimes, but not always. It's directory-specific whether it not existing will be a problem. |
 | Communicating between instances and versions of yourself | A pod can use a hostPath directory as a sort of cache and, as opposed to an emptyDir, persist the directory between versions of itself | [etcd](https://github.com/kubernetes/kubernetes/blob/fac54c9b22eff5c5052a8e3369cf8416a7827d36/cluster/saltbase/salt/etcd/etcd.manifest#L84), caches | :x: | It's pretty much always okay to create them |
 ### Other motivating factors
 One additional motivating factor for this change is that under the rkt runtime
 paths are not created when they do not exist. This change moves the management
 of these volumes into the Kubelet to the benefit of the rkt container runtime.
 ## Proposed API Change
 ### Host Volume
 I propose that the
 [`v1.HostPathVolumeSource`](https://github.com/kubernetes/kubernetes/blob/d26b4ca2859aa667ad520fb9518e0db67b74216a/pkg/api/types.go#L447-L451)
 object be changed to include the following additional field:
 `Type` - An optional string of `exists|file|device|socket|directory` - If not
 set, it will default to a backwards-compatible default behavior described
 below.
 | Value | Behavior |
 |:------|:---------|
 | *unset* | If nothing exists at the given path, an empty directory will be created there. Otherwise, behaves like `exists` |
 | `exists` | If nothing exists at the given path, the pod will fail to run and provide an informative error message |
 | `file` | If a file does not exist at the given path, the pod will fail to run and provide an informative error message |
 | `device` | If a block or character device does not exist at the given path, the pod will fail to run and provide an informative error message |
 | `socket` | If a socket does not exist at the given path, the pod will fail to run and provide an informative error message |
 | `directory` | If a directory does not exist at the given path, the pod will fail to run and provide an informative error message |
 Additional possible values, which are proposed to be excluded:
 |Value | Behavior | Reason for exclusion |
 |:-----|:---------|:---------------------|
 | `new-directory` | Like `auto`, but the given path must be a directory if it exists | `auto` mostly fills this use-case |
 | `character-device` |  | Granularity beyond `device` shouldn't matter often |
 | `block-device` |  | Granularity beyond `device` shouldn't matter often |
 | `new-file` | Like file, but if nothing exist an empty file is created instead | In general, bindmounting the parent directory of the file you intend to create addresses this usecase |
 | `optional` | If a path does not exist, then do not create any container-mount at all | This would better be handled by a new field entirely if this behavior is desirable |
 ### Why not as part of any other volume types?
 This feature does not make sense for any of the other volume types simply
 because all of the other types are already fully qualified. For example, NFS
 volumes are known to always be in existence else they will not mount.
 Similarly, EmptyDir volumes will always exist as a directory.
 Only the HostVolume and SubPath means of referencing a path have the potential
 to reference arbitrary incorrect or nonexistent things without erroring out.
 ### Alternatives
 One alternative is to augment Host Volumes with a `MustExist` bool and provide
 no further granularity. This would allow toggling between the `auto` and
 `exists` behaviors described above. This would likely cover the "90%" use-case
 and would be a simpler API. It would be sufficient for all of the examples
 linked above in my opionion.
 ## Kubelet implementation
 It's proposed that prior to starting a pod, the Kubelet validates that the
 given path meets the qualifications of its type. Namely, if the type is `auto`
 the Kubelet will create an empty directory if none exists there, and for each
 of the others the Kubelet will perform the given validation prior to running
 the pod. This validation might be done by a volume plugin, but further
 technical consideration (out of scope of this proposal) is needed.
 ## Possible concerns
 ### Permissions
 This proposal does not attempt to change the state of volume permissions. Currently, a HostPath volume is created with `root` ownership and `755` permissions. This behavior will be retained. An argument for this behavior is given [here](volumes.md#shared-storage-hostpath).
 ### SELinux
 This proposal should not impact SELinux relabeling. Verifying the presence and
 type of a given path will be logically separate from SELinux labeling.
 Similarly, creating the directory when it doesn't exist will happen before any
 SELinux operations and should not impact it.
 ### Containerized Kubelet
 A containerized kubelet would have difficulty creating directories. The
 implementation will likely respect the `containerized` flag, or similar,
 allowing it to either break out or be "/rootfs/" aware and thus operate as
 desired.
 ### Racy Validation
 Ideally the validation would be done at the time the bindmounts are created,
 else it's possible for a given path or directory to change in the duration from
 when it's validated and the container runtime attempts to create said mount.
 The only way to solve this problem is to integrate these sorts of qualification
 into container runtimes themselves.
 I don't think this problem is severe enough that we need to push to solve it;
 rather I think we can simply accept this minor race, and if runtimes eventually
 allow this we can begin to leverage them.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volume-hostpath-qualifiers.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/volume-ownership-management.md
+++ b/docs/proposals/volume-ownership-management.md
@ -1,108 +1 @@
-## Volume plugins and idempotency
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-ownership-management.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-ownership-management.md)
 Currently, volume plugins have a `SetUp` method which is called in the context of a higher-level
 workflow within the kubelet which has externalized the problem of managing the ownership of volumes.
 This design has a number of drawbacks that can be mitigated by completely internalizing all concerns
 of volume setup behind the volume plugin `SetUp` method.
 ### Known issues with current externalized design
 1.  The ownership management is currently repeatedly applied, which breaks packages that require
    special permissions in order to work correctly
 2.  There is a gap between files being mounted/created by volume plugins and when their ownership
    is set correctly; race conditions exist around this
 3.  Solving the correct application of ownership management in an externalized model is difficult
    and makes it clear that the a transaction boundary is being broken by the externalized design
 ### Additional issues with externalization
 Fully externalizing any one concern of volumes is difficult for a number of reasons:
 1.  Many types of idempotence checks exist, and are used in a variety of combinations and orders
 2.  Workflow in the kubelet becomes much more complex to handle:
    1.  composition of plugins
    2.  correct timing of application of ownership management
    3.  callback to volume plugins when we know the whole `SetUp` flow is complete and correct
    4.  callback to touch sentinel files
    5.  etc etc
 3.  We want to support fully external volume plugins -- would require complex orchestration / chatty
    remote API
 ## Proposed implementation
 Since all of the ownership information is known in advance of the call to the volume plugin `SetUp`
 method, we can easily internalize these concerns into the volume plugins and pass the ownership
 information to `SetUp`.
 The volume `Builder` interface's `SetUp` method changes to accept the group that should own the
 volume.  Plugins become responsible for ensuring that the correct group is applied.  The volume
 `Attributes` struct can be modified to remove the `SupportsOwnershipManagement` field.
 ```go
 package volume
 type Builder interface {
    // other methods omitted
    // SetUp prepares and mounts/unpacks the volume to a self-determined
    // directory path and returns an error.  The group ID that should own the volume
    // is passed as a parameter.  Plugins may choose to ignore the group ID directive
    // in the event that they do not support it (example: NFS).  A group ID of -1
    // indicates that the group ownership of the volume should not be modified by the plugin.
    //
    // SetUp will be called multiple times and should be idempotent.
    SetUp(gid int64) error
 }
 ```
 Each volume plugin will have to change to support the new `SetUp` signature.  The existing
 ownership management code will be refactored into a library that volume plugins can use:
 ```
 package volume
 func ManageOwnership(path string, fsGroup int64) error {
    // 1. recursive chown of path
    // 2. make path +setgid
 }
 ```
 The workflow from the Kubelet's perspective for handling volume setup and refresh becomes:
 ```go
 // go-ish pseudocode
 func mountExternalVolumes(pod) error {
    podVolumes := make(kubecontainer.VolumeMap)
    for i := range pod.Spec.Volumes {
        volSpec := &pod.Spec.Volumes[i]
        var fsGroup int64 = 0
        if pod.Spec.SecurityContext != nil &&
            pod.Spec.SecurityContext.FSGroup != nil {
            fsGroup = *pod.Spec.SecurityContext.FSGroup
        } else {
            fsGroup = -1
        }
        // Try to use a plugin for this volume.
        plugin := volume.NewSpecFromVolume(volSpec)
        builder, err := kl.newVolumeBuilderFromPlugins(plugin, pod)
        if err != nil {
            return err
        }
        if builder == nil {
            return errUnsupportedVolumeType
        }
        err := builder.SetUp(fsGroup)
        if err != nil {
            return nil
        }
    }
    return nil
 }
 ```
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volume-ownership-management.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/volume-provisioning.md
+++ b/docs/proposals/volume-provisioning.md
@ -1,500 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-provisioning.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-provisioning.md)
 Real Kubernetes clusters have a variety of volumes which differ widely in
 size, iops performance, retention policy, and other characteristics.
 Administrators need a way to dynamically provision volumes of these different
 types to automatically meet user demand.
 A new mechanism called 'storage classes' is proposed to provide this
 capability.
 ## Motivation
 In Kubernetes 1.2, an alpha form of limited dynamic provisioning was added
 that allows a single volume type to be provisioned in clouds that offer
 special volume types.
 In Kubernetes 1.3, a label selector was added to persistent volume claims to
 allow administrators to create a taxonomy of volumes based on the
 characteristics important to them, and to allow users to make claims on those
 volumes based on those characteristics.  This allows flexibility when claiming
 existing volumes; the same flexibility is needed when dynamically provisioning
 volumes.
 After gaining experience with dynamic provisioning after the 1.2 release, we
 want to create a more flexible feature that allows configuration of how
 different storage classes are provisioned and supports provisioning multiple
 types of volumes within a single cloud.
 ### Out-of-tree provisioners
 One of our goals is to enable administrators to create out-of-tree
 provisioners, that is, provisioners whose code does not live in the Kubernetes
 project.
 ## Design
 This design represents the minimally viable changes required to provision based on storage class configuration.  Additional incremental features may be added as a separate effort.
 We propose that:
 1.  Both for in-tree and out-of-tree storage provisioners, the PV created by the
    provisioners must match the PVC that led to its creations. If a provisioner
    is unable to provision such a matching PV, it reports an error to the
    user.
 2.  The above point applies also to PVC label selector. If user submits a PVC
    with a label selector, the provisioner must provision a PV with matching
    labels. This directly implies that the provisioner understands meaning
    behind these labels - if user submits a claim with selector that wants
    a PV with label "region" not in "[east,west]", the provisioner must
    understand what label "region" means, what available regions are there and
    choose e.g. "north".
    In other words, provisioners should either refuse to provision a volume for
    a PVC that has a selector, or select few labels that are allowed in
    selectors (such as the "region" example above), implement necessary logic
    for their parsing, document them and refuse any selector that references
    unknown labels.
 3.  An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
    API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class.  These parameters are opaque to the user.
 4.  `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
    is optional and specifies which `StorageClass` instance represents
    storage characteristics of a particular PV.
    During incubation, `Class` is an annotation and not
    actual attribute.
 5.  `PersistentVolume` instances do not require labels by the provisioner.
 6.  `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
    attribute specifies that only a volume with equal
    `PersistentVolume.Spec.Class` value can satisfy a claim.
    During incubation, `Class` is just an annotation and not
    actual attribute.
 7.  The existing provisioner plugin implementations be modified to accept
    parameters as specified via `StorageClass`.
 8.  The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
 9.  The existing alpha dynamic provisioning feature be phased out in the
    next release.
 ### Controller workflow for provisioning volumes
 0. Kubernetes administator can configure name of a default StorageClass. This
   StorageClass instance is then used when user requests a dynamically
   provisioned volume, but does not specify a StorageClass. In other words,
   `claim.Spec.Class == ""`
   (or annotation `volume.beta.kubernetes.io/storage-class == ""`).
 1.  When a new claim is submitted, the controller attempts to find an existing
    volume that will fulfill the claim.
    1. If the claim has non-empty `claim.Spec.Class`, only PVs with the same
        `pv.Spec.Class` are considered.
    2. If the claim has empty `claim.Spec.Class`, only PVs with an unset `pv.Spec.Class` are considered.
    All "considered" volumes are evaluated and the
    smallest matching volume is bound to the claim.
 2.  If no volume is found for the claim and `claim.Spec.Class` is not set or is
    empty string dynamic provisioning is disabled.
 3.  If `claim.Spec.Class` is set the controller tries to find instance of StorageClass with this name.  If no
    such StorageClass is found, the controller goes back to step 1. and
    periodically retries finding a matching volume or storage class again until
    a match is found. The claim is `Pending` during this period.
 4.  With StorageClass instance, the controller updates the claim:
       * `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner`
 * **In-tree provisioning**
   The controller tries to find an internal volume plugin referenced by
   `storageClass.Provisioner`. If it is found:
  5.  The internal provisioner implements interface`ProvisionableVolumePlugin`,
      which has a method called `NewProvisioner` that returns a new provisioner.
  6.  The controller calls volume plugin `Provision` with Parameters
      from the `StorageClass` configuration object.
  7.  If `Provision` returns an error, the controller generates an event on the
      claim and goes back to step 1., i.e. it will retry provisioning
      periodically.
  8.  If `Provision` returns no error, the controller creates the returned
      `api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
      and makes it already bound to the claim
    1.  If the create operation for the `api.PersistentVolume` fails, it is
        retried
    2.  If the create operation does not succeed in reasonable time, the
        controller attempts to delete the provisioned volume and creates an event
        on the claim
 Existing behavior is unchanged for claims that do not specify
 `claim.Spec.Class`.
 * **Out of tree provisioning**
  Following step 4. above, the controller tries to find internal plugin for the
  `StorageClass`. If it is not found, it does not do anything, it just
  periodically goes to step 1., i.e. tries to find available matching PV.
  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
  "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to be
  interpreted as described in RFC 2119.
  External provisioner must have these features:
  * It MUST have a distinct name, following Kubernetenes plugin naming scheme
    `<vendor name>/<provisioner name>`, e.g. `gluster.org/gluster-volume`.
  * The provisioner SHOULD send events on a claim to report any errors
    related to provisioning a volume for the claim. This way, users get the same
    experience as with internal provisioners.
  * The provisioner MUST implement also a deleter. It must be able to delete
    storage assets it created. It MUST NOT assume that any other internal or
    external plugin is present.
  The external provisioner runs in a separate process which watches claims, be
  it an external storage appliance, a daemon or a Kubernetes pod. For every
  claim creation or update, it implements these steps:
  1. The provisioner inspects if
     `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] == <provisioner name>`.
     All other claims MUST be ignored.
  2. The provisioner MUST check that the claim is unbound, i.e. its
     `claim.Spec.VolumeName` is empty. Bound volumes MUST be ignored.
     *Race condition when the provisioner provisions a new PV for a claim and
     at the same time Kubernetes binds the same claim to another PV that was
     just created by admin is discussed below.*
  3. It tries to find a StorageClass instance referenced by annotation
     `claim.Annotations["volume.beta.kubernetes.io/storage-class"]`. If not
     found, it SHOULD report an error (by sending an event to the claim) and it
     SHOULD retry periodically with step i.
  4. The provisioner MUST parse arguments in the `StorageClass` and
     `claim.Spec.Selector` and provisions appropriate storage asset that matches
     both the parameters and the selector.
     When it encounters unknown parameters in `storageClass.Parameters` or
     `claim.Spec.Selector` or the combination of these parameters is impossible
     to achieve, it SHOULD report an error and it MUST NOT provision a volume.
     All errors found during parsing or provisioning SHOULD be send as events
     on the claim and the provisioner SHOULD retry periodically with step i.
     As parsing (and understanding) claim selectors is hard, the sentence
     "MUST parse ... `claim.Spec.Selector`"  will in typical case lead to simple
     refusal of claims that have any selector:
     ```go
     if pvc.Spec.Selector != nil {
        return Error("can't parse PVC selector!")
     }
     ```
  5. When the volume is provisioned, the provisioner MUST create a new PV
     representing  the storage asset and save it in Kubernetes. When this fails,
     it SHOULD retry creating the PV again few times. If all attempts fail, it
     MUST delete the storage asset. All errors SHOULD be sent as events to the
     claim.
     The created PV MUST have these properties:
     * `pv.Spec.ClaimRef` MUST point to the claim that led to its creation
       (including the claim UID).
       *This way, the PV will be bound to the claim.*
     * `pv.Annotations["pv.kubernetes.io/provisioned-by"]` MUST be set to name
       of the external provisioner. This provisioner will be used to delete the
       volume.
       *The provisioner/delete should not assume there is any other
       provisioner/deleter available that would delete the volume.*
     * `pv.Annotations["volume.beta.kubernetes.io/storage-class"]` MUST be set
       to name of the storage class requested by the claim.
       *So the created PV matches the claim.*
     * The provisioner MAY store any other information to the created PV as
       annotations. It SHOULD save any information that is needed to delete the
       storage asset there, as appropriate StorageClass instance may not exist
       when the volume will be deleted. However, references to Secret instance
       or direct username/password to a remote storage appliance MUST NOT be
       stored there, see issue #34822.
     * `pv.Labels` MUST be set to match `claim.spec.selector`. The provisioner
       MAY add additional labels.
       *So the created PV matches the claim.*
     * `pv.Spec` MUST be set to match requirements in `claim.Spec`, especially
       access mode and PV size. The provisioned volume size MUST NOT be smaller
       than size requested in the claim, however it MAY be larger.
       *So the created PV matches the claim.*
     * `pv.Spec.PersistentVolumeSource` MUST be set to point to the created
       storage asset.
     * `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
       user manually configures other reclaim policy.
     * `pv.Name` MUST be unique. Internal provisioners use name based on
       `claim.UID` to produce conflicts when two provisioners accidentally
       provision a PV for the same claim, however external provisioners can use
       any mechanism to generate an unique PV name.
  Example of a claim that is to be provisioned by an external provisioner for
  `foo.org/foo-volume`:
  ```yaml
  apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    annotations:
      volume.beta.kubernetes.io/storage-class: myClass
      volume.beta.kubernetes.io/storage-provisioner: foo.org/foo-volume
    name: fooclaim
    namespace: default
    resourceVersion: "53"
    uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 4Gi
  #  volumeName: must be empty!
  ```
  Example of the created PV:
  ```yaml
  apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/provisioned-by: foo.org/foo-volume
      volume.beta.kubernetes.io/storage-class: myClass
      foo.org/provisioner: "any other annotations as needed"
    labels:
        foo.org/my-label: "any labels as needed"
    generateName: "foo-volume-"
  spec:
    accessModes:
    - ReadWriteOnce
    awsElasticBlockStore:
      fsType: ext4
      volumeID: aws://us-east-1d/vol-de401a79
    capacity:
      storage: 4Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: fooclaim
      namespace: default
      resourceVersion: "53"
      uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
    persistentVolumeReclaimPolicy: Delete
  ```
  As result, Kubernetes has a PV that represents the storage asset and is bound
  to the claim. When everything went well, Kubernetes completed binding of the
  claim to the PV.
  Kubernetes was not blocked in any way during the provisioning and could
  either bound the claim to another PV that was created by user or even the
  claim may have been deleted by the user. In both cases, Kubernetes will mark
  the PV to be delete using the protocol below.
  The external provisioner MAY save any annotations to the claim that is
  provisioned, however the claim may be modified or even deleted by the user at
  any time.
 ### Controller workflow for deleting volumes
 When the controller decides that a volume should be deleted it performs these
 steps:
 1. The controller changes `pv.Status.Phase` to `Released`.
 2. The controller looks for `pv.Annotations["pv.kubernetes.io/provisioned-by"]`.
   If found, it uses this provisioner/deleter to delete the volume.
 3. If the volume is not annotated by `pv.kubernetes.io/provisioned-by`, the
   controller inspects `pv.Spec` and finds in-tree deleter for the volume.
 4. If the deleter found by steps 2. or 3. is internal, it calls it and deletes
   the storage asset together with the PV that represents it.
 5. If the deleter is not known to Kubernetes, it does not do anything.
 6. External deleters MUST watch for PV changes. When
   `pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
   the deleter:
   * It MUST check reclaim policy of the PV and ignore all PVs whose
     `Spec.PersistentVolumeReclaimPolicy` is not `Delete`.
   * It MUST delete the storage asset.
   * Only after the storage asset was successfully deleted, it MUST delete the
     PV object in Kubernetes.
   * Any error SHOULD be sent as an event on the PV being deleted and the
     deleter SHOULD retry to delete the volume periodically.
   * The deleter SHOULD NOT use any information from StorageClass instance
     referenced by the PV. This is different to internal deleters, which
     need to be StorageClass instance present at the time of deletion to read
     Secret instances (see Gluster provisioner for example), however we would
     like to phase out this behavior.
   Note that watching `pv.Status` has been frowned upon in the past, however in
   this particular case we could use it quite reliably to trigger deletion.
   It's not trivial to find out if a PV is not needed and should be deleted.
   *Alternatively, an annotation could be used.*
 ### Security considerations
 Both internal and external provisioners and deleters may need access to
 credentials (e.g. username+password) of an external storage appliance to
 provision and delete volumes.
 * For internal provisioners, a Secret instance in a well secured namespace
 should be used. Pointer to the Secret instance shall be parameter of the
 StorageClass and it MUST NOT be copied around the system e.g. in annotations
 of PVs. See issue #34822.
 * External provisioners running in pod should have appropriate credentials
 mouted as Secret inside pods that run the provisioner. Namespace with the pods
 and Secret instance should be well secured.
 ### `StorageClass` API
 A new API group should hold the API for storage classes, following the pattern
 of autoscaling, metrics, etc.  To allow for future storage-related APIs, we
 should call this new API group `storage.k8s.io` and incubate in storage.k8s.io/v1beta1.
 Storage classes will be represented by an API object called `StorageClass`:
 ```go
 package storage
 // StorageClass describes the parameters for a class of storage for
 // which PersistentVolumes can be dynamically provisioned.
 //
 // StorageClasses are non-namespaced; the name of the storage class
 // according to etcd is in ObjectMeta.Name.
 type StorageClass struct {
  unversioned.TypeMeta `json:",inline"`
  ObjectMeta           `json:"metadata,omitempty"`
  // Provisioner indicates the type of the provisioner.
  Provisioner string `json:"provisioner,omitempty"`
  // Parameters for dynamic volume provisioner.
  Parameters map[string]string `json:"parameters,omitempty"`
 }
 ```
 `PersistentVolumeClaimSpec` and `PersistentVolumeSpec` both get Class attribute
 (the existing annotation is used during incubation):
 ```go
 type PersistentVolumeClaimSpec struct {
    // Name of requested storage class. If non-empty, only PVs with this
    // pv.Spec.Class will be considered for binding and if no such PV is
    // available, StorageClass with this name will be used to dynamically
    // provision the volume.
    Class string
 ...
 }
 type PersistentVolumeSpec struct {
    // Name of StorageClass instance that this volume belongs to.
    Class string
 ...
 }
 ```
 Storage classes are natural to think of as a global resource, since they:
 1.  Align with PersistentVolumes, which are a global resource
 2.  Are administrator controlled
 ### Provisioning configuration
 With the scheme outlined above the provisioner creates PVs using parameters specified in the `StorageClass` object.
 ### Provisioner interface changes
 `struct volume.VolumeOptions` (containing parameters for a provisioner plugin)
 will be extended to contain StorageClass.Parameters.
 The existing provisioner implementations will be modified to accept the StorageClass configuration object.
 ### PV Controller Changes
 The persistent volume controller will be modified to implement the new
 workflow described in this proposal.  The changes will be limited to the
 `provisionClaimOperation` method, which is responsible for invoking the
 provisioner and to favor existing volumes before provisioning a new one.
 ## Examples
 ### AWS provisioners with distinct QoS
 This example shows two storage classes, "aws-fast" and "aws-slow".
 ```
 apiVersion: v1
 kind: StorageClass
 metadata:
  name: aws-fast
 provisioner: kubernetes.io/aws-ebs
 parameters:
   zone: us-east-1b
   type: ssd
 apiVersion: v1
 kind: StorageClass
 metadata:
  name: aws-slow
 provisioner: kubernetes.io/aws-ebs
 parameters:
   zone: us-east-1b
   type: spinning
 ```
 # Additional Implementation Details
 0. Annotation `volume.alpha.kubernetes.io/storage-class` is used instead of `claim.Spec.Class` and `volume.Spec.Class` during incubation.
 1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive for now (1.4). User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners. This limitation may be lifted in future releases.
 # Cloud Providers
 Since the `volume.alpha.kubernetes.io/storage-class` is in use a `StorageClass` must be defined to support provisioning.  No default is assumed as before.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volume-provisioning.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/volume-selectors.md
+++ b/docs/proposals/volume-selectors.md
@ -1,268 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-selectors.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-selectors.md)
 Real Kubernetes clusters have a variety of volumes which differ widely in
 size, iops performance, retention policy, and other characteristics.  A
 mechanism is needed to enable administrators to describe the taxonomy of these
 volumes, and for users to make claims on these volumes based on their
 attributes within this taxonomy.
 A label selector mechanism is proposed to enable flexible selection of volumes
 by persistent volume claims.
 ## Motivation
 Currently, users of persistent volumes have the ability to make claims on
 those volumes based on some criteria such as the access modes the volume
 supports and minimum resources offered by a volume.  In an organization, there
 are often more complex requirements for the storage volumes needed by
 different groups of users.  A mechanism is needed to model these different
 types of volumes and to allow users to select those different types without
 being intimately familiar with their underlying characteristics.
 As an example, many cloud providers offer a range of performance
 characteristics for storage, with higher performing storage being more
 expensive.  Cluster administrators want the ability to:
 1.  Invent a taxonomy of logical storage classes using the attributes
    important to them
 2.  Allow users to make claims on volumes using these attributes
 ## Constraints and Assumptions
 The proposed design should:
 1.  Deal with manually-created volumes
 2.  Not necessarily require users to know or understand the differences between
    volumes (ie, Kubernetes should not dictate any particular set of
    characteristics to administrators to think in terms of)
 We will focus **only** on the barest mechanisms to describe and implement
 label selectors in this proposal.  We will address the following topics in
 future proposals:
 1.  An extension resource or third party resource for storage classes
 1.  Dynamically provisioning new volumes for based on storage class
 ## Use Cases
 1.  As a user, I want to be able to make a claim on a persistent volume by
    specifying a label selector as well as the currently available attributes
 ### Use Case: Taxonomy of Persistent Volumes
 Kubernetes offers volume types for a variety of storage systems.  Within each
 of those storage systems, there are numerous ways in which volume instances
 may differ from one another: iops performance, retention policy, etc.
 Administrators of real clusters typically need to manage a variety of
 different volumes with different characteristics for different groups of
 users.
 Kubernetes should make it possible for administrators to flexibly model the
 taxonomy of volumes in their clusters and to label volumes with their storage
 class.  This capability must be optional and fully backward-compatible with
 the existing API.
 Let's look at an example.  This example is *purely fictitious* and the
 taxonomies presented here are not a suggestion of any sort.  In the case of
 AWS EBS there are four different types of volume (in ascending order of cost):
 1.  Cold HDD
 2.  Throughput optimized HDD
 3.  General purpose SSD
 4.  Provisioned IOPS SSD
 Currently, there is no way to distinguish between a group of 4 PVs where each
 volume is of one of these different types.  Administrators need the ability to
 distinguish between instances of these types.  An administrator might decide
 to think of these volumes as follows:
 1.  Cold HDD - `tin`
 2.  Throughput optimized HDD - `bronze`
 3.  General purpose SSD - `silver`
 4.  Provisioned IOPS SSD - `gold`
 This is not the only dimension that EBS volumes can differ in.  Let's simplify
 things and imagine that AWS has two availability zones, `east` and `west`. Our
 administrators want to differentiate between volumes of the same type in these
 two zones, so they create a taxonomy of volumes like so:
 1.  `tin-west`
 2.  `tin-east`
 3.  `bronze-west`
 4.  `bronze-east`
 5.  `silver-west`
 6.  `silver-east`
 7.  `gold-west`
 8.  `gold-east`
 Another administrator of the same cluster might label things differently,
 choosing to focus on the business role of volumes.  Say that the data
 warehouse department is the sole consumer of the cold HDD type, and the DB as
 a service offering is the sole consumer of provisioned IOPS volumes.  The
 administrator might decide on the following taxonomy of volumes:
 1.  `warehouse-east`
 2.  `warehouse-west`
 3.  `dbaas-east`
 4.  `dbaas-west`
 There are any number of ways an administrator may choose to distinguish
 between volumes.  Labels are used in Kubernetes to express the user-defined
 properties of API objects and are a good fit to express this information for
 volumes.  In the examples above, administrators might differentiate between
 the classes of volumes using the labels `business-unit`, `volume-type`, or
 `region`.
 Label selectors are used through the Kubernetes API to describe relationships
 between API objects using flexible, user-defined criteria.  It makes sense to
 use the same mechanism with persistent volumes and storage claims to provide
 the same functionality for these API objects.
 ## Proposed Design
 We propose that:
 1.  A new field called `Selector` be added to the `PersistentVolumeClaimSpec`
    type
 2.  The persistent volume controller be modified to account for this selector
    when determining the volume to bind to a claim
 ### Persistent Volume Selector
 Label selectors are used throughout the API to allow users to express
 relationships in a flexible manner.  The problem of selecting a volume to
 match a claim fits perfectly within this metaphor.  Adding a label selector to
 `PersistentVolumeClaimSpec` will allow users to label their volumes with
 criteria important to them and select volumes based on these criteria.
 ```go
 // PersistentVolumeClaimSpec describes the common attributes of storage devices
 // and allows a Source for provider-specific attributes
 type PersistentVolumeClaimSpec struct {
    // Contains the types of access modes required
    AccessModes []PersistentVolumeAccessMode `json:"accessModes,omitempty"`
    // Selector is a selector which must be true for the claim to bind to a volume
    Selector *unversioned.Selector `json:"selector,omitempty"`
    // Resources represents the minimum resources required
    Resources ResourceRequirements `json:"resources,omitempty"`
    // VolumeName is the binding reference to the PersistentVolume backing this claim
    VolumeName string `json:"volumeName,omitempty"`
 }
 ```
 ### Labeling volumes
 Volumes can already be labeled:
 ```yaml
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: ebs-pv-1
  labels:
    ebs-volume-type: iops
    aws-availability-zone: us-east-1
 spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  awsElasticBlockStore:
    volumeID: vol-12345
    fsType: xfs
 ```
 ### Controller Changes
 At the time of this writing, the various controllers for persistent volumes
 are in the process of being refactored into a single controller (see
 [kubernetes/24331](https://github.com/kubernetes/kubernetes/pull/24331)).
 The resulting controller should be modified to use the new
 `selector` field to match a claim to a volume.  In order to
 match to a volume, all criteria must be satisfied; ie, if a label selector is
 specified on a claim, a volume must match both the label selector and any
 specified access modes and resource requirements to be considered a match.
 ## Examples
 Let's take a look at a few examples, revisiting the taxonomy of EBS volumes and regions:
 Volumes of the different types might be labeled as follows:
 ```yaml
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: ebs-pv-west
  labels:
    ebs-volume-type: iops-ssd
    aws-availability-zone: us-west-1
 spec:
  capacity:
    storage: 150Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  awsElasticBlockStore:
    volumeID: vol-23456
    fsType: xfs
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: ebs-pv-east
  labels:
    ebs-volume-type: gp-ssd
    aws-availability-zone: us-east-1
 spec:
  capacity:
    storage: 150Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  awsElasticBlockStore:
    volumeID: vol-34567
    fsType: xfs
 ```
 ...claims on these volumes would look like:
 ```yaml
 kind: PersistentVolumeClaim
 apiVersion: v1
 metadata:
  name: ebs-claim-west
 spec:
  accessModes:
    - ReadWriteMany 
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      ebs-volume-type: iops-ssd
      aws-availability-zone: us-west-1
 kind: PersistentVolumeClaim
 apiVersion: v1
 metadata:
  name: ebs-claim-east
 spec:
  accessModes:
    - ReadWriteMany 
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      ebs-volume-type: gp-ssd
      aws-availability-zone: us-east-1
 ```
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volume-selectors.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/volumes.md
+++ b/docs/proposals/volumes.md
@ -1,482 +1 @@
-## Abstract
+This file has moved to [https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volumes.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volumes.md)
 A proposal for sharing volumes between containers in a pod using a special supplemental group.
 ## Motivation
 Kubernetes volumes should be usable regardless of the UID a container runs as.  This concern cuts
 across all volume types, so the system should be able to handle them in a generalized way to provide
 uniform functionality across all volume types and lower the barrier to new plugins.
 Goals of this design:
 1.  Enumerate the different use-cases for volume usage in pods
 2.  Define the desired goal state for ownership and permission management in Kubernetes
 3.  Describe the changes necessary to achieve desired state
 ## Constraints and Assumptions
 1.  When writing permissions in this proposal, `D` represents a don't-care value; example: `07D0`
    represents permissions where the owner has `7` permissions, all has `0` permissions, and group
    has a don't-care value
 2.  Read-write usability of a volume from a container is defined as one of:
    1.  The volume is owned by the container's effective UID and has permissions `07D0`
    2.  The volume is owned by the container's effective GID or one of its supplemental groups and
        has permissions `0D70`
 3.  Volume plugins should not have to handle setting permissions on volumes
 5.  Preventing two containers within a pod from reading and writing to the same volume (by choosing
    different container UIDs) is not something we intend to support today
 6.  We will not design to support multiple processes running in a single container as different
    UIDs; use cases that require work by different UIDs should be divided into different pods for
    each UID
 ## Current State Overview
 ### Kubernetes
 Kubernetes volumes can be divided into two broad categories:
 1.  Unshared storage:
    1.  Volumes created by the kubelet on the host directory: empty directory, git repo, secret,
        downward api.  All volumes in this category delegate to `EmptyDir` for their underlying
        storage.  These volumes are created with ownership `root:root`.
    2.  Volumes based on network block devices: AWS EBS, iSCSI, RBD, etc, *when used exclusively
        by a single pod*.
 2.  Shared storage:
    1.  `hostPath` is shared storage because it is necessarily used by a container and the host
    2.  Network file systems such as NFS, Glusterfs, Cephfs, etc.  For these volumes, the ownership
        is determined by the configuration of the shared storage system.
    3.  Block device based volumes in `ReadOnlyMany` or `ReadWriteMany` modes are shared because
        they may be used simultaneously by multiple pods.
 The `EmptyDir` volume was recently modified to create the volume directory with `0777` permissions
 from `0750` to support basic usability of that volume as a non-root UID.
 ### Docker
 Docker recently added supplemental group support.  This adds the ability to specify additional
 groups that a container should be part of, and will be released with Docker 1.8.
 There is a [proposal](https://github.com/docker/docker/pull/14632) to add a bind-mount flag to tell
 Docker to change the ownership of a volume to the effective UID and GID of a container, but this has
 not yet been accepted.
 ### rkt
 rkt
 [image manifests](https://github.com/appc/spec/blob/master/spec/aci.md#image-manifest-schema) can
 specify users and groups, similarly to how a Docker image can.  A rkt
 [pod manifest](https://github.com/appc/spec/blob/master/spec/pods.md#pod-manifest-schema) can also
 override the default user and group specified by the image manifest.
 rkt does not currently support supplemental groups or changing the owning UID or
 group of a volume, but it has been [requested](https://github.com/coreos/rkt/issues/1309).
 ## Use Cases
 1.  As a user, I want the system to set ownership and permissions on volumes correctly to enable
    reads and writes with the following scenarios:
    1.  All containers running as root
    2.  All containers running as the same non-root user
    3.  Multiple containers running as a mix of root and non-root users
 ### All containers running as root
 For volumes that only need to be used by root, no action needs to be taken to change ownership or
 permissions, but setting the ownership based on the supplemental group shared by all containers in a
 pod will also work.  For situations where read-only access to a shared volume is required from one
 or more containers, the `VolumeMount`s in those containers should have the `readOnly` field set.
 ### All containers running as a single non-root user
 In use cases whether a volume is used by a single non-root UID the volume ownership and permissions
 should be set to enable read/write access.
 Currently, a non-root UID will not have permissions to write to any but an `EmptyDir` volume.
 Today, users that need this case to work can:
 1.  Grant the container the necessary capabilities to `chown` and `chmod` the volume:
    - `CAP_FOWNER`
    - `CAP_CHOWN`
    - `CAP_DAC_OVERRIDE`
 2.  Run a wrapper script that runs `chown` and `chmod` commands to set the desired ownership and
    permissions on the volume before starting their main process
 This workaround has significant drawbacks:
 1.  It grants powerful kernel capabilities to the code in the image and thus is not securing,
    defeating the reason containers are run as non-root users
 2.  The user experience is poor; it requires changing Dockerfile, adding a layer, or modifying the
    container's command
 Some cluster operators manage the ownership of shared storage volumes on the server side.
 In this scenario, the UID of the container using the volume is known in advance.  The ownership of
 the volume is set to match the container's UID on the server side.
 ### Containers running as a mix of root and non-root users
 If the list of UIDs that need to use a volume includes both root and non-root users, supplemental
 groups can be applied to enable sharing volumes between containers.  The ownership and permissions
 `root:<supplemental group> 2770` will make a volume usable from both containers running as root and
 running as a non-root UID and the supplemental group.  The setgid bit is used to ensure that files
 created in the volume will inherit the owning GID of the volume.
 ## Community Design Discussion
 - [kubernetes/2630](https://github.com/kubernetes/kubernetes/issues/2630)
 - [kubernetes/11319](https://github.com/kubernetes/kubernetes/issues/11319)
 - [kubernetes/9384](https://github.com/kubernetes/kubernetes/pull/9384)
 ## Analysis
 The system needs to be able to:
 1.  Model correctly which volumes require ownership management
 1.  Determine the correct ownership of each volume in a pod if required
 1.  Set the ownership and permissions on volumes when required
 ### Modeling whether a volume requires ownership management
 #### Unshared storage: volumes derived from `EmptyDir`
 Since Kubernetes creates `EmptyDir` volumes, it should ensure the ownership is set to enable the
 volumes to be usable for all of the above scenarios.
 #### Unshared storage: network block devices
 Volume plugins based on network block devices such as AWS EBS and RBS can be treated the same way
 as local volumes.  Since inodes are written to these block devices in the same way as `EmptyDir`
 volumes, permissions and ownership can be managed on the client side by the Kubelet when used
 exclusively by one pod.  When the volumes are used outside of a persistent volume, or with the
 `ReadWriteOnce` mode, they are effectively unshared storage.
 When used by multiple pods, there are many additional use-cases to analyze before we can be
 confident that we can support ownership management robustly with these file systems.  The right
 design is one that makes it easy to experiment and develop support for ownership management with
 volume plugins to enable developers and cluster operators to continue exploring these issues.
 #### Shared storage: hostPath
 The `hostPath` volume should only be used by effective-root users, and the permissions of paths
 exposed into containers via hostPath volumes should always be managed by the cluster operator.  If
 the Kubelet managed the ownership for `hostPath` volumes, a user who could create a `hostPath`
 volume could affect changes in the state of arbitrary paths within the host's filesystem.  This
 would be a severe security risk, so we will consider hostPath a corner case that the kubelet should
 never perform ownership management for.
 #### Shared storage
 Ownership management of shared storage is a complex topic.  Ownership for existing shared storage
 will be managed externally from Kubernetes.  For this case, our API should make it simple to express
 whether a particular volume should have these concerns managed by Kubernetes.
 We will not attempt to address the ownership and permissions concerns of new shared storage
 in this proposal.
 When a network block device is used as a persistent volume in `ReadWriteMany` or `ReadOnlyMany`
 modes, it is shared storage, and thus outside the scope of this proposal.
 #### Plugin API requirements
 From the above, we know that some volume plugins will 'want' ownership management from the Kubelet
 and others will not.  Plugins should be able to opt in to ownership management from the Kubelet.  To
 facilitate this, there should be a method added to the `volume.Plugin` interface that the Kubelet
 uses to determine whether to perform ownership management for a volume.
 ### Determining correct ownership of a volume
 Using the approach of a pod-level supplemental group to own volumes solves the problem in any of the
 cases of UID/GID combinations within a pod. Since this is the simplest approach that handles all
 use-cases, our solution will be made in terms of it.
 Eventually, Kubernetes should allocate a unique group for each pod so that a pod's volumes are
 usable by that pod's containers, but not by containers of another pod.  The supplemental group used
 to share volumes must be unique in a multitenant cluster.  If uniqueness is enforced at the host
 level, pods from one host may be able to use shared filesystems meant for pods on another host.
 Eventually, Kubernetes should integrate with external identity management systems to populate pod
 specs with the right supplemental groups necessary to use shared volumes.  In the interim until the
 identity management story is far enough along to implement this type of integration, we will rely
 on being able to set arbitrary groups.  (Note: as of this writing, a PR is being prepared for
 setting arbitrary supplemental groups).
 An admission controller could handle allocating groups for each pod and setting the group in the
 pod's security context.
 #### A note on the root group
 Today, by default, all docker containers are run in the root group (GID 0).  This is relied on by
 image authors that make images to run with a range of UIDs: they set the group ownership for
 important paths to be the root group, so that containers running as GID 0 *and* an arbitrary UID
 can read and write to those paths normally.
 It is important to note that the changes proposed here will not affect the primary GID of
 containers in pods.  Setting the `pod.Spec.SecurityContext.FSGroup` field will not
 override the primary GID and should be safe to use in images that expect GID 0.
 ### Setting ownership and permissions on volumes
 For `EmptyDir`-based volumes and unshared storage, `chown` and `chmod` on the node are sufficient to
 set ownership and permissions.  Shared storage is different because:
 1.  Shared storage may not live on the node a pod that uses it runs on
 2.  Shared storage may be externally managed
 ## Proposed design:
 Our design should minimize code for handling ownership required in the Kubelet and volume plugins.
 ### API changes
 We should not interfere with images that need to run as a particular UID or primary GID.  A pod
 level supplemental group allows us to express a group that all containers in a pod run as in a way
 that is orthogonal to the primary UID and GID of each container process.
 ```go
 package api
 type PodSecurityContext struct {
    // FSGroup is a supplemental group that all containers in a pod run under.  This group will own
    // volumes that the Kubelet manages ownership for.  If this is not specified, the Kubelet will
    // not set the group ownership of any volumes.
    FSGroup *int64 `json:"fsGroup,omitempty"`
 }
 ```
 The V1 API will be extended with the same field:
 ```go
 package v1
 type PodSecurityContext struct {
    // FSGroup is a supplemental group that all containers in a pod run under.  This group will own
    // volumes that the Kubelet manages ownership for.  If this is not specified, the Kubelet will
    // not set the group ownership of any volumes.
    FSGroup *int64 `json:"fsGroup,omitempty"`
 }
 ```
 The values that can be specified for the `pod.Spec.SecurityContext.FSGroup` field are governed by
 [pod security policy](https://github.com/kubernetes/kubernetes/pull/7893).
 #### API backward compatibility
 Pods created by old clients will have the `pod.Spec.SecurityContext.FSGroup` field unset;
 these pods will not have their volumes managed by the Kubelet.  Old clients will not be able to set
 or read the `pod.Spec.SecurityContext.FSGroup` field.
 ### Volume changes
 The `volume.Mounter` interface should have a new method added that indicates whether the plugin
 supports ownership management:
 ```go
 package volume
 type Mounter interface {
    // other methods omitted
    // SupportsOwnershipManagement indicates that this volume supports having ownership
    // and permissions managed by the Kubelet; if true, the caller may manipulate UID
    // or GID of this volume.
    SupportsOwnershipManagement() bool
 }
 ```
 In the first round of work, only `hostPath` and `emptyDir` and its derivations will be tested with
 ownership management support:
 | Plugin Name             | SupportsOwnershipManagement   |
 |-------------------------|-------------------------------|
 | `hostPath`              | false                         |
 | `emptyDir`              | true                          |
 | `gitRepo`               | true                          |
 | `secret`                | true                          |
 | `downwardAPI`           | true                          |
 | `gcePersistentDisk`     | false                         |
 | `awsElasticBlockStore`  | false                         |
 | `nfs`                   | false                         |
 | `iscsi`                 | false                         |
 | `glusterfs`             | false                         |
 | `persistentVolumeClaim` | depends on underlying volume and PV mode |
 | `rbd`                   | false                         |
 | `cinder`                | false                         |
 | `cephfs`                | false                         |
 Ultimately, the matrix will theoretically look like:
 | Plugin Name             | SupportsOwnershipManagement   |
 |-------------------------|-------------------------------|
 | `hostPath`              | false                         |
 | `emptyDir`              | true                          |
 | `gitRepo`               | true                          |
 | `secret`                | true                          |
 | `downwardAPI`           | true                          |
 | `gcePersistentDisk`     | true                          |
 | `awsElasticBlockStore`  | true                          |
 | `nfs`                   | false                         |
 | `iscsi`                 | true                          |
 | `glusterfs`             | false                         |
 | `persistentVolumeClaim` | depends on underlying volume and PV mode |
 | `rbd`                   | true                          |
 | `cinder`                | false                         |
 | `cephfs`                | false                         |
 ### Kubelet changes
 The Kubelet should be modified to perform ownership and label management when required for a volume.
 For ownership management the criteria are:
 1.  The `pod.Spec.SecurityContext.FSGroup` field is populated
 2.  The volume builder returns `true` from `SupportsOwnershipManagement`
 Logic should be added to the `mountExternalVolumes` method that runs a local `chgrp` and `chmod` if
 the pod-level supplemental group is set and the volume supports ownership management:
 ```go
 package kubelet
 type ChgrpRunner interface {
    Chgrp(path string, gid int) error
 }
 type ChmodRunner interface {
    Chmod(path string, mode os.FileMode) error
 }
 type Kubelet struct {
    chgrpRunner ChgrpRunner
    chmodRunner ChmodRunner
 }
 func (kl *Kubelet) mountExternalVolumes(pod *api.Pod) (kubecontainer.VolumeMap, error) {
    podFSGroup = pod.Spec.PodSecurityContext.FSGroup
    podFSGroupSet := false
    if podFSGroup != 0 {
        podFSGroupSet = true
    }
    podVolumes := make(kubecontainer.VolumeMap)
    for i := range pod.Spec.Volumes {
        volSpec := &pod.Spec.Volumes[i]
        rootContext, err := kl.getRootDirContext()
        if err != nil {
            return nil, err
        }
        // Try to use a plugin for this volume.
        internal := volume.NewSpecFromVolume(volSpec)
        builder, err := kl.newVolumeMounterFromPlugins(internal, pod, volume.VolumeOptions{RootContext: rootContext}, kl.mounter)
        if err != nil {
            glog.Errorf("Could not create volume builder for pod %s: %v", pod.UID, err)
            return nil, err
        }
        if builder == nil {
            return nil, errUnsupportedVolumeType
        }
        err = builder.SetUp()
        if err != nil {
            return nil, err
        }
        if builder.SupportsOwnershipManagement() &&
           podFSGroupSet {
            err = kl.chgrpRunner.Chgrp(builder.GetPath(), podFSGroup)
            if err != nil {
                return nil, err
            }
            err = kl.chmodRunner.Chmod(builder.GetPath(), os.FileMode(1770))
            if err != nil {
                return nil, err
            }
        }
        podVolumes[volSpec.Name] = builder
    }
    return podVolumes, nil
 }
 ```
 This allows the volume plugins to determine when they do and don't want this type of support from
 the Kubelet, and allows the criteria each plugin uses to evolve without changing the Kubelet.
 The docker runtime will be modified to set the supplemental group of each container based on the
 `pod.Spec.SecurityContext.FSGroup` field.  Theoretically, the `rkt` runtime could support this
 feature in a similar way.
 ### Examples
 #### EmptyDir
 For a pod that has two containers sharing an `EmptyDir` volume:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: test-pod
 spec:
  securityContext:
    fsGroup: 1001
  containers:
  - name: a
    securityContext:
      runAsUser: 1009
    volumeMounts:
      - mountPath: "/example/hostpath/a"
        name: empty-vol
  - name: b
    securityContext:
      runAsUser: 1010
    volumeMounts:
      - mountPath: "/example/hostpath/b"
        name: empty-vol
  volumes:
    - name: empty-vol
 ```
 When the Kubelet runs this pod, the `empty-vol` volume will have ownership root:1001 and permissions
 `0770`.  It will be usable from both containers a and b.
 #### HostPath
 For a volume that uses a `hostPath` volume with containers running as different UIDs:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: test-pod
 spec:
  securityContext:
    fsGroup: 1001
  containers:
  - name: a
    securityContext:
      runAsUser: 1009
    volumeMounts:
      - mountPath: "/example/hostpath/a"
        name: host-vol
  - name: b
    securityContext:
      runAsUser: 1010
    volumeMounts:
      - mountPath: "/example/hostpath/b"
        name: host-vol
  volumes:
    - name: host-vol
      hostPath:
        path: "/tmp/example-pod"
 ```
 The cluster operator would need to manually `chgrp` and `chmod` the `/tmp/example-pod` on the host
 in order for the volume to be usable from the pod.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volumes.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->