Merge pull request #9458 from brendandburns/release-0.19

Release 0.19.0
This commit is contained in:
Abhi Shah 2015-06-10 10:34:31 -07:00
commit d2b17a5c49
516 changed files with 43292 additions and 2 deletions

View File

@ -33,6 +33,7 @@ kube::test::find_dirs() {
-o -wholename './release' \
-o -wholename './target' \
-o -wholename '*/Godeps/*' \
-o -wholename './release*' \
-o -wholename '*/contrib/podex/*' \
-o -wholename '*/test/e2e/*' \
-o -wholename '*/test/integration/*' \

View File

@ -36,8 +36,8 @@ package version
var (
// TODO: Deprecate gitMajor and gitMinor, use only gitVersion instead.
gitMajor string = "0" // major version, always numeric
gitMinor string = "18.1+" // minor version, numeric possibly followed by "+"
gitVersion string = "v0.18.1-dev" // version from git, output of $(git describe)
gitMinor string = "19.0+" // minor version, numeric possibly followed by "+"
gitVersion string = "v0.19.0-dev" // version from git, output of $(git describe)
gitCommit string = "" // sha1 from git, output of $(git rev-parse HEAD)
gitTreeState string = "not a git tree" // state of git tree, either "clean" or "dirty"
)

View File

@ -0,0 +1,28 @@
kubectl.md
kubectl_api-versions.md
kubectl_cluster-info.md
kubectl_config.md
kubectl_config_set-cluster.md
kubectl_config_set-context.md
kubectl_config_set-credentials.md
kubectl_config_set.md
kubectl_config_unset.md
kubectl_config_use-context.md
kubectl_config_view.md
kubectl_create.md
kubectl_delete.md
kubectl_describe.md
kubectl_exec.md
kubectl_expose.md
kubectl_get.md
kubectl_label.md
kubectl_logs.md
kubectl_namespace.md
kubectl_port-forward.md
kubectl_proxy.md
kubectl_rolling-update.md
kubectl_run.md
kubectl_scale.md
kubectl_stop.md
kubectl_update.md
kubectl_version.md

View File

@ -0,0 +1,29 @@
# Kubernetes Documentation
**Note**
This documentation is current for 0.19.0.
Documentation for previous releases is available in their respective branches:
* [v0.18.1](https://github.com/GoogleCloudPlatform/kubernetes/tree/release-0.18/docs)
* [v0.17.1](https://github.com/GoogleCloudPlatform/kubernetes/tree/release-0.17/docs)
* The [User's guide](user-guide.md) is for anyone who wants to run programs and services on an exisiting Kubernetes cluster.
* The [Cluster Admin's guide](cluster-admin-guide.md) is for anyone setting up a Kubernetes cluster or administering it.
* The [Developer guide](developer-guide.md) is for anyone wanting to write programs that access the kubernetes API,
write plugins or extensions, or modify the core code of kubernetes.
* The [Kubectl Command Line Interface](kubectl.md) is a detailed reference on the `kubectl` CLI.
* The [API object documentation](http://kubernetes.io/third_party/swagger-ui/) is a detailed description of all fields found in core API objects.
* An overview of the [Design of Kubernetes](design)
* There are example files and walkthroughs in the [examples](../examples) folder.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/README.md?pixel)]()

View File

@ -0,0 +1,342 @@
# User Guide to Accessing the Cluster
* [Accessing the cluster API](#api)
* [Accessing services running on the cluster](#otherservices)
* [Requesting redirects](#redirect)
* [So many proxies](#somanyproxies)
## Accessing the cluster API<a name="api"></a>
### Accessing for the first time with kubectl
When accessing the Kubernetes API for the first time, we suggest using the
kubernetes CLI, `kubectl`.
To access a cluster, you need to know the location of the cluster and have credentials
to access it. Typically, this is automatically set-up when you work through
though a [Getting started guide](../docs/getting-started-guide/README.md),
or someone else setup the cluster and provided you with credentials and a location.
Check the location and credentials that kubectl knows about with this command:
```
kubectl config view
```
.
Many of the [examples](../examples/README.md) provide an introduction to using
kubectl and complete documentation is found in the [kubectl manual](../docs/kubectl.md).
### <a name="kubectlproxy"</a>Directly accessing the REST API
Kubectl handles locating and authenticating to the apiserver.
If you want to directly access the REST API with an http client like
curl or wget, or a browser, there are several ways to locate and authenticate:
- Run kubectl in proxy mode.
- Recommended approach.
- Uses stored apiserver location.
- Verifies identity of apiserver using self-signed cert. No MITM possible.
- Authenticates to apiserver.
- In future, may do intelligent client-side load-balancing and failover.
- Provide the location and credentials directly to the http client.
- Alternate approach.
- Works with some types of client code that are confused by using a proxy.
- Need to import a root cert into your browser to protect against MITM.
#### Using kubectl proxy
The following command runs kubectl in a mode where it acts as a reverse proxy. It handles
locating the apiserver and authenticating.
Run it like this:
```
kubectl proxy --port=8080 &
```
See [kubectl proxy](../docs/kubectl-proxy.md) for more details.
Then you can explore the API with curl, wget, or a browser, like so:
```
$ curl http://localhost:8080/api
{
"versions": [
"v1"
]
}
```
#### Without kubectl proxy
It is also possible to avoid using kubectl proxy by passing an authentication token
directly to the apiserver, like this:
```
$ APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
$ TOKEN=$(kubectl config view | grep token | cut -f 2 -d ":" | tr -d " ")
$ curl $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure
{
"versions": [
"v1"
]
}
```
The above example uses the `--insecure` flag. This leaves it subject to MITM
attacks. When kubectl accesses the cluster it uses a stored root certificate
and client certificates to access the server. (These are installed in the
`~/.kube` directory). Since cluster certificates are typically self-signed, it
make take special configuration to get your http client to use root
certificate.
On some clusters, the apiserver does not require authentication; it may serve
on localhost, or be protected by a firewall. There is not a standard
for this. [Configuring Access to the API](../docs/accessing_the_api.md)
describes how a cluster admin can configure this. Such approaches may conflict
with future high-availability support.
### Programmatic access to the API
There are [client libraries](../docs/client-libraries.md) for accessing the API
from several languages. The Kubernetes project-supported
[Go](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client)
client library can use the same [kubeconfig file](../docs/kubeconfig-file.md)
as the kubectl CLI does to locate and authenticate to the apiserver.
See documentation for other libraries for how they authenticate.
### Accessing the API from a Pod
When accessing the API from a pod, locating and authenticating
to the api server are somewhat different.
The recommended way to locate the apiserver within the pod is with
the `kubernetes` DNS name, which resolves to a Service IP which in turn
will be routed to an apiserver.
The recommended way to authenticate to the apiserver is with a
[service account](../docs/service_accounts.md). By default, a pod
is associated with a service account, and a credential (token) for that
service account is placed into the filetree of each container in that pod,
at `/var/run/secrets/kubernetes.io/serviceaccount`.
From within a pod the recommended ways to connect to API are:
- run a kubectl proxy as one of the containers in the pod, or as a background
process within a container. This proxies the
kubernetes API to the localhost interface of the pod, so that other processes
in any container of the pod can access it. See this [example of using kubectl proxy
in a pod](../examples/kubectl-container/README.md).
- use the Go client library, and create a client using the `client.NewInContainer()` factory.
This handles locating and authenticating to the apiserver.
## <a name="otherservices"></a>Accessing services running on the cluster
The previous section was about connecting the Kubernetes API server. This section is about
connecting to other services running on Kubernetes cluster. In kubernetes, the
[nodes](../docs/node.md), [pods](../docs/pods.md) and [services](services.md) all have
their own IPs. In many cases, the node IPs, pod IPs, and some service IPs on a cluster will not be
routable outside from a machine outside the cluster, such as your desktop machine.
### Ways to connect
You have several options for connecting to nodes, pods and services from outside the cluster:
- Access services through public IPs.
- Use a service with type `NodePort` or `LoadBalancer` to make the service reachable outside
the cluster. See the [services](../docs/services.md) and
[kubectl expose](../docs/kubectl_expose.md) documentation.
- Depending on your cluster environment, this may just expose the service to your corporate network,
or it may expose it to the internet. Think about whether the service being exposed is secure.
Does it do its own authentication?
- Place pods behind services. To access one specific pod from a set of replicas, such as for debugging,
place a unique label on the pod it and create a new service which selects this label.
- In most cases, it should not be necessary for application developer to directly access
nodes via their nodeIPs.
- Access services, nodes, or pods using the Proxy Verb.
- Does apiserver authentication and authorization prior to accessing the remote service.
Use this if the services are not secure enough to expose to the internet, or to gain
access to ports on the node IP, or for debugging.
- Proxies may cause problems for some web applications.
- Only works for HTTP/HTTPS.
- Described in [using the apiserver proxy](#apiserverproxy).
- Access from a node or pod in the cluster.
- Run a pod, and then connect to a shell in it using [kubectl exec](../docs/kubectl_exec.md).
Connect to other nodes, pods, and services from that shell.
- Some clusters may allow you to ssh to a node in the cluster. From there you may be able to
access cluster services. This is a non-standard method, and will work on some clusters but
not others. Browsers and other tools may or may not be installed. Cluster DNS may not work.
### Discovering builtin services
Typically, there are several services which are started on a cluster by default. Get a list of these
with the `kubectl cluster-info` command:
```
$ kubectl cluster-info
Kubernetes master is running at https://104.197.5.247
elasticsearch-logging is running at https://104.197.5.247/api/v1/proxy/namespaces/default/services/elasticsearch-logging
kibana-logging is running at https://104.197.5.247/api/v1/proxy/namespaces/default/services/kibana-logging
kube-dns is running at https://104.197.5.247/api/v1/proxy/namespaces/default/services/kube-dns
grafana is running at https://104.197.5.247/api/v1/proxy/namespaces/default/services/monitoring-grafana
heapster is running at https://104.197.5.247/api/v1/proxy/namespaces/default/services/monitoring-heapster
```
This shows the proxy-verb URL for accessing each service.
For example, this cluster has cluster-level logging enabled (using Elasticsearch), which can be reached
at `https://104.197.5.247/api/v1/proxy/namespaces/default/services/elasticsearch-logging/` if suitable credentials are passed, or through a kubectl proxy at, for example:
`http://localhost:8080/api/v1/proxy/namespaces/default/services/elasticsearch-logging/`.
(See [above](#api) for how to pass credentials or use kubectl proxy.)
#### Manually constructing apiserver proxy URLs
As mentioned above, you use the `kubectl cluster-info` command to retrieve the service's proxy URL. To create proxy URLs that include service endpoints, suffixes, and parameters, you simply append to the service's proxy URL:
`http://`*`kubernetes_master_address`*`/`*`service_path`*`/`*`service_name`*`/`*`service_endpoint-suffix-parameter`*
##### Examples
* To access the Elasticsearch service endpoint `_search?q=user:kimchy`, you would use: `http://104.197.5.247/api/v1/proxy/namespaces/default/services/elasticsearch-logging/_search?q=user:kimchy`
* To access the Elasticsearch cluster health information `_cluster/health?pretty=true`, you would use: `https://104.197.5.247/api/v1/proxy/namespaces/default/services/elasticsearch-logging/_cluster/health?pretty=true`
```
{
"cluster_name" : "kubernetes_logging",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
}
```
#### Using web browsers to access services running on the cluster
You may be able to put a apiserver proxy url into the address bar of a browser. However:
- Web browsers cannot usually pass tokens, so you may need to use basic (password) auth. Apiserver can be configured to accespt basic auth,
but your cluster may not be configured to accept basic auth.
- Some web apps may not work, particularly those with client side javascript that construct urls in a
way that is unaware of the proxy path prefix.
## <a name="redirect"></a>Requesting redirects
Use a `redirect` request so that the server returns an HTTP redirect response and identifies the specific node and service that
can handle the request.
**Note**: Since the hostname or address that is returned is usually only accessible from inside the cluster,
sending `redirect` requests is useful only for code running inside the cluster. Also, keep in mind that any subsequent `redirect` requests to the same
server might return different results (because another node at that point in time can better serve the request).
**Tip**: Use a redirect request to reduce calls to the proxy server by first obtaining the address of a node on the
cluster and then using that returned address for all subsequent requests.
##### Example
To request a redirect and then verify the address that gets returned, let's run a query on `oban` (Google Compute Engine virtual machine). Note that `oban` is running in the same project and default network (Google Compute Engine) as the Kubernetes cluster.
To request a redirect for the Elasticsearch service, we can run the following `curl` command:
```
user@oban:~$ curl -L -k -u admin:4mty0Vl9nNFfwLJz https://104.197.5.247/api/v1/redirect/namespaces/default/services/elasticsearch-logging/
{
"status" : 200,
"name" : "Skin",
"cluster_name" : "kubernetes_logging",
"version" : {
"number" : "1.4.4",
"build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
"build_timestamp" : "2015-02-19T13:05:36Z",
"build_snapshot" : false,
"lucene_version" : "4.10.3"
},
"tagline" : "You Know, for Search"
}
```
**Note**: We use the `-L` flag in the request so that `curl` follows the returned redirect address and retrieves the Elasticsearch service information.
If we examine the actual redirect header (instead run the same `curl` command with `-v`), we see that the request to `https://104.197.5.247/api/v1/redirect/namespaces/default/services/elasticsearch-logging/` is redirected to `http://10.244.2.7:9200`:
```
user@oban:~$ curl -v -k -u admin:4mty0Vl9nNFfwLJz https://104.197.5.247/api/v1/redirect/namespaces/default/services/elasticsearch-logging/
* About to connect() to 104.197.5.247 port 443 (#0)
* Trying 104.197.5.247...
* connected
* Connected to 104.197.5.247 (104.197.5.247) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* subject: CN=kubernetes-master
* start date: 2015-03-04 19:40:24 GMT
* expire date: 2025-03-01 19:40:24 GMT
* issuer: CN=104.197.5.247@1425498024
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Server auth using Basic with user 'admin'
> GET /api/v1/redirect/namespaces/default/services/elasticsearch-logging HTTP/1.1
> Authorization: Basic YWRtaW46M210eTBWbDluTkZmd0xKeg==
> User-Agent: curl/7.26.0
> Host: 104.197.5.247
> Accept: */*
>
* additional stuff not fine transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 307 Temporary Redirect
< Server: nginx/1.2.1
< Date: Thu, 05 Mar 2015 00:14:45 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 0
< Connection: keep-alive
< Location: http://10.244.2.7:9200
<
* Connection #0 to host 104.197.5.247 left intact
* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
```
We can also run the `kubectl get pods` command to view a list of the pods on the cluster and verify that `http://10.244.2.7` is where the Elasticsearch service is running:
```
$ kubectl get pods
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS CREATED
elasticsearch-logging-controller-gziey 10.244.2.7 elasticsearch-logging kubernetes/elasticsearch:1.0 kubernetes-minion-hqhv.c.kubernetes-user2.internal/104.154.33.252 kubernetes.io/cluster-service=true,name=elasticsearch-logging Running 5 hours
kibana-logging-controller-ls6k1 10.244.1.9 kibana-logging kubernetes/kibana:1.1 kubernetes-minion-h5kt.c.kubernetes-user2.internal/146.148.80.37 kubernetes.io/cluster-service=true,name=kibana-logging Running 5 hours
kube-dns-oh43e 10.244.1.10 etcd quay.io/coreos/etcd:v2.0.3 kubernetes-minion-h5kt.c.kubernetes-user2.internal/146.148.80.37 k8s-app=kube-dns,kubernetes.io/cluster-service=true,name=kube-dns Running 5 hours
kube2sky kubernetes/kube2sky:1.0
skydns kubernetes/skydns:2014-12-23-001
monitoring-heapster-controller-fplln 10.244.0.4 heapster kubernetes/heapster:v0.8 kubernetes-minion-2il2.c.kubernetes-user2.internal/130.211.155.16 kubernetes.io/cluster-service=true,name=heapster,uses=monitoring-influxdb Running 5 hours
monitoring-influx-grafana-controller-0133o 10.244.3.4 influxdb kubernetes/heapster_influxdb:v0.3 kubernetes-minion-kmin.c.kubernetes-user2.internal/130.211.173.22 kubernetes.io/cluster-service=true,name=influxGrafana Running 5 hours
grafana kubernetes/heapster_grafana:v0.4
```
##<a name="somanyproxies"></a>So Many Proxies
There are several different proxies you may encounter when using kubernetes:
1. The [kubectl proxy](#kubectlproxy):
- runs on a user's desktop or in a pod
- proxies from a localhost address to the kubernetes apiserver
- client to proxy uses HTTP
- proxy to apiserver uses HTTPS
- locates apiserver
- adds authentication headers
1. The [apiserver proxy](#apiserverproxy):
- is a bastion built into the apiserver
- connects a user outside of the cluster to cluster IPs which otherwise might not be reachable
- runs in the apiserver processes
- client to proxy uses HTTPS (or http if apiserver so configured)
- proxy to target may use HTTP or HTTPS as chosen by proxy using available information
- can be used to reach a Node, Pod, or Service
- does load balancing when used to reach a Service
1. The [kube proxy](../docs/services.md#ips-and-vips):
- runs on each node
- proxies UDP and TCP
- does not understand HTTP
- provides load balancing
- is just used to reach services
1. A Proxy/Load-balancer in front of apiserver(s):
- existence and implementation varies from cluster to cluster (e.g. nginx)
- sits between all clients and one or more apiservers
- acts as load balancer if there are several apiservers.
1. Cloud Load Balancers on external services:
- are provided by some cloud providers (e.g. AWS ELB, Google Cloud Load Balancer)
- are created automatically when the kubernetes service has type `LoadBalancer`
- use UDP/TCP only
- implementation varies by cloud provider.
Kubernetes users will typically not need to worry about anything other than the first two types. The cluster admin
will typically ensure that the latter types are setup correctly.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/accessing-the-cluster.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/accessing-the-cluster.md?pixel)]()

View File

@ -0,0 +1,81 @@
# Configuring APIserver ports
This document describes what ports the kubernetes apiserver
may serve on and how to reach them. The audience is
cluster administrators who want to customize their cluster
or understand the details.
Most questions about accessing the cluster are covered
in [Accessing the cluster](../docs/accessing-the-cluster.md).
## Ports and IPs Served On
The Kubernetes API is served by the Kubernetes APIServer process. Typically,
there is one of these running on a single kubernetes-master node.
By default the Kubernetes APIserver serves HTTP on 2 ports:
1. Localhost Port
- serves HTTP
- default is port 8080, change with `-port` flag.
- defaults IP is localhost, change with `-address` flag.
- no authentication or authorization checks in HTTP
- protected by need to have host access
2. Secure Port
- default is port 443, change with `-secure_port`
- default IP is first non-localhost network interface, change with `-public_address_override`
- serves HTTPS. Set cert with `-tls_cert_file` and key with `-tls_private_key_file`.
- uses token-file or client-certificate based [authentication](./authentication.md).
- uses policy-based [authorization](./authorization.md).
3. Removed: ReadOnly Port
- For security reasons, this had to be removed. Use the service account feature instead.
## Proxies and Firewall rules
Additionally, in some configurations there is a proxy (nginx) running
on the same machine as the apiserver process. The proxy serves HTTPS protected
by Basic Auth on port 443, and proxies to the apiserver on localhost:8080. In
these configurations the secure port is typically set to 6443.
A firewall rule is typically configured to allow external HTTPS access to port 443.
The above are defaults and reflect how Kubernetes is deployed to GCE using
kube-up.sh. Other cloud providers may vary.
## Use Cases vs IP:Ports
There are three differently configured serving ports because there are a
variety of uses cases:
1. Clients outside of a Kubernetes cluster, such as human running `kubectl`
on desktop machine. Currently, accesses the Localhost Port via a proxy (nginx)
running on the `kubernetes-master` machine. Proxy uses bearer token authentication.
2. Processes running in Containers on Kubernetes that need to do read from
the apiserver. Currently, these can use a service account.
3. Scheduler and Controller-manager processes, which need to do read-write
API operations. Currently, these have to run on the operations on the
apiserver. Currently, these have to run on the same host as the
apiserver and use the Localhost Port. In the future, these will be
switched to using service accounts to avoid the need to be co-located.
4. Kubelets, which need to do read-write API operations and are necessarily
on different machines than the apiserver. Kubelet uses the Secure Port
to get their pods, to find the services that a pod can see, and to
write events. Credentials are distributed to kubelets at cluster
setup time.
## Expected changes
- Policy will limit the actions kubelets can do via the authed port.
- Kubelets will change from token-based authentication to cert-based-auth.
- Scheduler and Controller-manager will use the Secure Port too. They
will then be able to run on different machines than the apiserver.
- A general mechanism will be provided for [giving credentials to
pods](
https://github.com/GoogleCloudPlatform/kubernetes/issues/1907).
- Clients, like kubectl, will all support token-based auth, and the
Localhost will no longer be needed, and will not be the default.
However, the localhost port may continue to be an option for
installations that want to do their own auth proxy.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/accessing_the_api.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/accessing_the_api.md?pixel)]()

View File

@ -0,0 +1,112 @@
# Admission Controllers
## What are they?
An admission control plug-in is a piece of code that intercepts requests to the Kubernetes
API server prior to persistence of the object, but after the request is authenticated
and authorized. The plug-in code is in the API server process
and must be compiled into the binary in order to be used at this time.
Each admission control plug-in is run in sequence before a request is accepted into the cluster. If
any of the plug-ins in the sequence reject the request, the entire request is rejected immediately
and an error is returned to the end-user.
Admission control plug-ins may mutate the incoming object in some cases to apply system configured
defaults. In addition, admission control plug-ins may mutate related resources as part of request
processing to do things like increment quota usage.
## Why do I need them?
Many advanced features in Kubernetes require an admission control plug-in to be enabled in order
to properly support the feature. As a result, a Kubernetes API server that is not properly
configured with the right set of admission control plug-ins is an incomplete server and will not
support all the features you expect.
## How do I turn on an admission control plug-in?
The Kubernetes API server supports a flag, ```admission_control``` that takes a comma-delimited,
ordered list of admission control choices to invoke prior to modifying objects in the cluster.
## What does each plug-in do?
### AlwaysAdmit
Use this plugin by itself to pass-through all requests.
### AlwaysDeny
Rejects all requests. Used for testing.
### DenyExecOnPrivileged
This plug-in will intercept all requests to exec a command in a pod if that pod has a privileged container.
If your cluster supports privileged containers, and you want to restrict the ability of end-users to exec
commands in those containers, we strongly encourage enabling this plug-in.
### ServiceAccount
This plug-in implements automation for [serviceAccounts]( service_accounts.md).
We strongly recommend using this plug-in if you intend to make use of Kubernetes ```ServiceAccount``` objects.
### SecurityContextDeny
This plug-in will deny any pod with a [SecurityContext](security_context.md) that defines options that were not available on the ```Container```.
### ResourceQuota
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
enumerated in the ```ResourceQuota``` object in a ```Namespace```. If you are using ```ResourceQuota```
objects in your Kubernetes deployment, you MUST use this plug-in to enforce quota constraints.
See the [resourceQuota design doc]( design/admission_control_resource_quota.md).
It is strongly encouraged that this plug-in is configured last in the sequence of admission control plug-ins. This is
so that quota is not prematurely incremented only for the request to be rejected later in admission control.
### LimitRanger
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
enumerated in the ```LimitRange``` object in a ```Namespace```. If you are using ```LimitRange``` objects in
your Kubernetes deployment, you MUST use this plug-in to enforce those constraints.
See the [limitRange design doc]( design/admission_control_limit_range.md).
### NamespaceExists
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
and reject the request if the ```Namespace``` was not previously created. We strongly recommend running
this plug-in to ensure integrity of your data.
### NamespaceAutoProvision (deprecated)
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
and create a new ```Namespace``` if one did not already exist previously.
We strongly recommend ```NamespaceExists``` over ```NamespaceAutoProvision```.
### NamespaceLifecycle
This plug-in enforces that a ```Namespace``` that is undergoing termination cannot have new content created in it.
A ```Namespace``` deletion kicks off a sequence of operations that remove all content (pods, services, etc.) in that
namespace. In order to enforce integrity of that process, we strongly recommend running this plug-in.
Once ```NamespaceAutoProvision``` is deprecated, we anticipate ```NamespaceLifecycle``` and ```NamespaceExists``` will
be merged into a single plug-in that enforces the life-cycle of a ```Namespace``` in Kubernetes.
## Is there a recommended set of plug-ins to use?
Yes.
For Kubernetes 1.0, we strongly recommend running the following set of admission control plug-ins (order matters):
```shell
--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admission_controllers.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/admission_controllers.md?pixel)]()

View File

@ -0,0 +1,31 @@
# Annotations
We have [labels](labels.md) for identifying metadata.
It is also useful to be able to attach arbitrary non-identifying metadata, for retrieval by API clients such as tools, libraries, etc. This information may be large, may be structured or unstructured, may include characters not permitted by labels, etc. Such information would not be used for object selection and therefore doesn't belong in labels.
Like labels, annotations are key-value maps.
```
"annotations": {
"key1" : "value1",
"key2" : "value2"
}
```
Possible information that could be recorded in annotations:
* fields managed by a declarative configuration layer, to distinguish them from client- and/or server-set default values and other auto-generated fields, fields set by auto-sizing/auto-scaling systems, etc., in order to facilitate merging
* build/release/image information (timestamps, release ids, git branch, PR numbers, image hashes, registry address, etc.)
* pointers to logging/monitoring/analytics/audit repos
* client library/tool information (e.g. for debugging purposes -- name, version, build info)
* other user and/or tool/system provenance info, such as URLs of related objects from other ecosystem components
* lightweight rollout tool metadata (config and/or checkpoints)
* phone/pager number(s) of person(s) responsible, or directory entry where that info could be found, such as a team website
Yes, this information could be stored in an external database or directory, but that would make it much harder to produce shared client libraries and tools for deployment, management, introspection, etc.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/annotations.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/annotations.md?pixel)]()

View File

@ -0,0 +1,593 @@
API Conventions
===============
Updated: 4/16/2015
The conventions of the [Kubernetes API](api.md) (and related APIs in the ecosystem) are intended to ease client development and ensure that configuration mechanisms can be implemented that work across a diverse set of use cases consistently.
The general style of the Kubernetes API is RESTful - clients create, update, delete, or retrieve a description of an object via the standard HTTP verbs (POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return JSON. Kubernetes also exposes additional endpoints for non-standard verbs and allows alternative content types. All of the JSON accepted and returned by the server has a schema, identified by the "kind" and "apiVersion" fields. Where relevant HTTP header fields exist, they should mirror the content of JSON fields, but the information should not be represented only in the HTTP header.
The following terms are defined:
* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" kinds would have different attributes and properties)
* **Resource** a representation of a system entity, sent or retrieved as JSON via HTTP to the server. Resources are exposed via:
* Collections - a list of resources of the same type, which may be queryable
* Elements - an individual resource, addressable via a URL
Each resource typically accepts and returns data of a single kind. A kind may be accepted or returned by multiple resources that reflect specific use cases. For instance, the kind "pod" is exposed as a "pods" resource that allows end users to create, update, and delete pods, while a separate "pod status" resource (that acts on "pod" kind) allows automated processes to update a subset of the fields in that resource. A "restart" resource might be exposed for a number of different resources to allow the same action to have different results for each object.
Resource collections should be all lowercase and plural, whereas kinds are CamelCase and singular.
Types (Kinds)
-------------
Kinds are grouped into three categories:
1. **Objects** represent a persistent entity in the system.
Creating an API object is a record of intent - once created, the system will work to ensure that resource exists. All API objects have common metadata.
An object may have multiple resources that clients can use to perform specific actions than create, update, delete, or get.
Examples: Pods, ReplicationControllers, Services, Namespaces, Nodes
2. **Lists** are collections of **resources** of one (usually) or more (occasionally) kinds.
Lists have a limited set of common metadata. All lists use the "items" field to contain the array of objects they return.
Most objects defined in the system should have an endpoint that returns the full set of resources, as well as zero or more endpoints that return subsets of the full list. Some objects may be singletons (the current user, the system defaults) and may not have lists.
In addition, all lists that return objects with labels should support label filtering (see [labels.md](labels.md), and most lists should support filtering by fields.
Examples: PodLists, ServiceLists, NodeLists
TODO: Describe field filtering below or in a separate doc.
3. **Simple** kinds are used for specific actions on objects and for non-persistent entities.
Given their limited scope, they have the same set of limited common metadata as lists.
The "size" action may accept a simple resource that has only a single field as input (the number of things). The "status" kind is returned when errors occur and is not persisted in the system.
Examples: Binding, Status
The standard REST verbs (defined below) MUST return singular JSON objects. Some API endpoints may deviate from the strict REST pattern and return resources that are not singular JSON objects, such as streams of JSON objects or unstructured text log data.
The term "kind" is reserved for these "top-level" API types. The term "type" should be used for distinguishing sub-categories within objects or subobjects.
### Resources
All JSON objects returned by an API MUST have the following fields:
* kind: a string that identifies the schema this object should have
* apiVersion: a string that identifies the version of the schema the object should have
These fields are required for proper decoding of the object. They may be populated by the server by default from the specified URL path, but the client likely needs to know the values in order to construct the URL path.
### Objects
#### Metadata
Every object kind MUST have the following metadata in a nested object field called "metadata":
* namespace: a namespace is a DNS compatible subdomain that objects are subdivided into. The default namespace is 'default'. See [namespaces.md](namespaces.md) for more.
* name: a string that uniquely identifies this object within the current namespace (see [identifiers.md](identifiers.md)). This value is used in the path when retrieving an individual object.
* uid: a unique in time and space value (typically an RFC 4122 generated identifier, see [identifiers.md](identifiers.md)) used to distinguish between objects with the same name that have been deleted and recreated
Every object SHOULD have the following metadata in a nested object field called "metadata":
* resourceVersion: a string that identifies the internal version of this object that can be used by clients to determine when objects have changed. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. (see [concurrency control](#concurrency-control-and-consistency), below, for more details)
* creationTimestamp: a string representing an RFC 3339 date of the date and time an object was created
* deletionTimestamp: a string representing an RFC 3339 date of the date and time after which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource will be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field. Once set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time.
* labels: a map of string keys and values that can be used to organize and categorize objects (see [labels.md](labels.md))
* annotations: a map of string keys and values that can be used by external tooling to store and retrieve arbitrary metadata about this object (see [annotations.md](annotations.md))
Labels are intended for organizational purposes by end users (select the pods that match this label query). Annotations enable third-party automation and tooling to decorate objects with additional metadata for their own use.
#### Spec and Status
By convention, the Kubernetes API makes a distinction between the specification of the desired state of an object (a nested object field called "spec") and the status of the object at the current time (a nested object field called "status"). The specification is a complete description of the desired state, including configuration settings provided by the user, [default values](#defaulting) expanded by the system, and properties initialized or otherwise changed after creation by other ecosystem components (e.g., schedulers, auto-scalers), and is persisted in stable storage with the API object. If the specification is deleted, the object will be purged from the system. The status summarizes the current state of the object in the system, and is usually persisted with the object by an automated processes but may be generated on the fly. At some cost and perhaps some temporary degradation in behavior, the status could be reconstructed by observation if it were lost.
When a new version of an object is POSTed or PUT, the "spec" is updated and available immediately. Over time the system will work to bring the "status" into line with the "spec". The system will drive toward the most recent "spec" regardless of previous versions of that stanza. In other words, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the "status" to 3. In other words, the system's behavior is *level-based* rather than *edge-based*. This enables robust behavior in the presence of missed intermediate state changes.
The Kubernetes API also serves as the foundation for the declarative configuration schema for the system. In order to facilitate level-based operation and expression of declarative configuration, fields in the specification should have declarative rather than imperative names and semantics -- they represent the desired state, not actions intended to yield the desired state.
The PUT and POST verbs on objects will ignore the "status" values. A `/status` subresource is provided to enable system components to update statuses of resources they manage.
Otherwise, PUT expects the whole object to be specified. Therefore, if a field is omitted it is assumed that the client wants to clear that field's value. The PUT verb does not accept partial updates. Modification of just part of an object may be achieved by GETting the resource, modifying part of the spec, labels, or annotations, and then PUTting it back. See [concurrency control](#concurrency-control-and-consistency), below, regarding read-modify-write consistency when using this pattern. Some objects may expose alternative resource representations that allow mutation of the status, or performing custom actions on the object.
All objects that represent a physical resource whose state may vary from the user's desired intent SHOULD have a "spec" and a "status". Objects whose state cannot vary from the user's desired intent MAY have only "spec", and MAY rename "spec" to a more appropriate name.
Objects that contain both spec and status should not contain additional top-level fields other than the standard metadata fields.
##### Typical status properties
* **phase**: The phase is a simple, high-level summary of the phase of the lifecycle of an object. The phase should progress monotonically. Typical phase values are `Pending` (not yet fully physically realized), `Running` or `Active` (fully realized and active, but not necessarily operating correctly), and `Terminated` (no longer active), but may vary slightly for different types of objects. New phase values should not be added to existing objects in the future. Like other status fields, it must be possible to ascertain the lifecycle phase by observation. Additional details regarding the current phase may be contained in other fields.
* **conditions**: Conditions represent orthogonal observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Condition status values may be `True`, `False`, or `Unknown`. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time.
TODO(@vishh): Reason and Message.
Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time.
In order to preserve extensibility, in the future, we intend to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from observations.
Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost.
Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.
#### References to related objects
References to loosely coupled sets of objects, such as [pods](pods.md) overseen by a [replication controller](replication-controller.md), are usually best referred to using a [label selector](labels.md). In order to ensure that GETs of individual objects remain bounded in time and space, these sets may be queried via separate API queries, but will not be expanded in the referring object's status.
References to specific objects, especially specific resource versions and/or specific fields of those objects, are specified using the `ObjectReference` type. Unlike partial URLs, the ObjectReference type facilitates flexible defaulting of fields from the referring object or other contextual information.
References in the status of the referee to the referrer may be permitted, when the references are one-to-one and do not need to be frequently updated, particularly in an edge-based manner.
#### Lists of named subobjects preferred over maps
Discussed in [#2004](https://github.com/GoogleCloudPlatform/kubernetes/issues/2004) and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields.
For example:
```yaml
ports:
- name: www
containerPort: 80
```
vs.
```yaml
ports:
www:
containerPort: 80
```
This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, and annotations), as opposed to sets of subobjects.
#### Constants
Some fields will have a list of allowed values (enumerations). These values will be strings, and they will be in CamelCase, with an initial uppercase letter. Examples: "ClusterFirst", "Pending", "ClientIP".
### Lists and Simple kinds
Every list or simple kind SHOULD have the following metadata in a nested object field called "metadata":
* resourceVersion: a string that identifies the common version of the objects returned by in a list. This value MUST be treated as opaque by clients and passed unmodified back to the server. A resource version is only valid within a single namespace on a single kind of resource.
Every simple kind returned by the server, and any simple kind sent to the server that must support idempotency or optimistic concurrency should return this value.Since simple resources are often used as input alternate actions that modify objects, the resource version of the simple resource should correspond to the resource version of the object.
Differing Representations
-------------------------
An API may represent a single entity in different ways for different clients, or transform an object after certain transitions in the system occur. In these cases, one request object may have two representations available as different resources, or different kinds.
An example is a Service, which represents the intent of the user to group a set of pods with common behavior on common ports. When Kubernetes detects a pod matches the service selector, the IP address and port of the pod are added to an Endpoints resource for that Service. The Endpoints resource exists only if the Service exists, but exposes only the IPs and ports of the selected pods. The full service is represented by two distinct resources - under the original Service resource the user created, as well as in the Endpoints resource.
As another example, a "pod status" resource may accept a PUT with the "pod" kind, with different rules about what fields may be changed.
Future versions of Kubernetes may allow alternative encodings of objects beyond JSON.
Verbs on Resources
------------------
API resources should use the traditional REST pattern:
* GET /&lt;resourceNamePlural&gt; - Retrieve a list of type &lt;resourceName&gt;, e.g. GET /pods returns a list of Pods.
* POST /&lt;resourceNamePlural&gt; - Create a new resource from the JSON object provided by the client.
* GET /&lt;resourceNamePlural&gt;/&lt;name&gt; - Retrieves a single resource with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be constant time, and the resource should be bounded in size.
* DELETE /&lt;resourceNamePlural&gt;/&lt;name&gt; - Delete the single resource with the given name. DeleteOptions may specify gracePeriodSeconds, the optional duration in seconds before the object should be deleted. Individual kinds may declare fields which provide a default grace period, and different kinds may have differing kind-wide default grace periods. A user provided grace period overrides a default grace period, including the zero grace period ("now").
* PUT /&lt;resourceNamePlural&gt;/&lt;name&gt; - Update or create the resource with the given name with the JSON object provided by the client.
* PATCH /&lt;resourceNamePlural&gt;/&lt;name&gt; - Selectively modify the specified fields of the resource. See more information [below](#patch).
Kubernetes by convention exposes additional verbs as new root endpoints with singular names. Examples:
* GET /watch/&lt;resourceNamePlural&gt; - Receive a stream of JSON objects corresponding to changes made to any resource of the given kind over time.
* GET /watch/&lt;resourceNamePlural&gt;/&lt;name&gt; - Receive a stream of JSON objects corresponding to changes made to the named resource of the given kind over time.
These are verbs which change the fundamental type of data returned (watch returns a stream of JSON instead of a single JSON object). Support of additional verbs is not required for all object types.
Two additional verbs `redirect` and `proxy` provide access to cluster resources as described in [accessing-the-cluster.md](accessing-the-cluster.md).
When resources wish to expose alternative actions that are closely coupled to a single resource, they should do so using new sub-resources. An example is allowing automated processes to update the "status" field of a Pod. The `/pods` endpoint only allows updates to "metadata" and "spec", since those reflect end-user intent. An automated process should be able to modify status for users to see by sending an updated Pod kind to the server to the "/pods/&lt;name&gt;/status" endpoint - the alternate endpoint allows different rules to be applied to the update, and access to be appropriately restricted. Likewise, some actions like "stop" or "scale" are best represented as REST sub-resources that are POSTed to. The POST action may require a simple kind to be provided if the action requires parameters, or function without a request body.
TODO: more documentation of Watch
### PATCH operations
The API supports three different PATCH operations, determined by their corresponding Content-Type header:
* JSON Patch, `Content-Type: application/json-patch+json`
* As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is a sequence of operations that are executed on the resource, e.g. `{"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use JSON Patch, see the RFC.
* Merge Patch, `Content-Type: application/merge-json-patch+json`
* As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch is essentially a partial representation of the resource. The submitted JSON is "merged" with the current resource to create a new one, then the new one is saved. For more details on how to use Merge Patch, see the RFC.
* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json`
* Strategic Merge Patch is a custom implementation of Merge Patch. For a detailed explanation of how it works and why it needed to be introduced, see below.
#### Strategic Merge Patch
In the standard JSON merge patch, JSON objects are always merged but lists are always replaced. Often that isn't what we want. Let's say we start with the following Pod:
```yaml
spec:
containers:
- name: nginx
image: nginx-1.0
```
...and we POST that to the server (as JSON). Then let's say we want to *add* a container to this Pod.
```yaml
PATCH /api/v1/namespaces/default/pods/pod-name
spec:
containers:
- name: log-tailer
image: log-tailer-1.0
```
If we were to use standard Merge Patch, the entire container list would be replaced with the single log-tailer container. However, our intent is for the container lists to merge together based on the `name` field.
To solve this problem, Strategic Merge Patch uses metadata attached to the API objects to determine what lists should be merged and which ones should not. Currently the metadata is available as struct tags on the API objects themselves, but will become available to clients as Swagger annotations in the future. In the above example, the `patchStrategy` metadata for the `containers` field would be `merge` and the `patchMergeKey` would be `name`.
Note: If the patch results in merging two lists of scalars, the scalars are first deduplicated and then merged.
Strategic Merge Patch also supports special operations as listed below.
### List Operations
To override the container list to be strictly replaced, regardless of the default:
```yaml
containers:
- name: nginx
image: nginx-1.0
- $patch: replace # any further $patch operations nested in this list will be ignored
```
To delete an element of a list that should be merged:
```yaml
containers:
- name: nginx
image: nginx-1.0
- $patch: delete
name: log-tailer # merge key and value goes here
```
### Map Operations
To indicate that a map should not be merged and instead should be taken literally:
```yaml
$patch: replace # recursive and applies to all fields of the map it's in
containers:
- name: nginx
image: nginx-1.0
```
To delete a field of a map:
```yaml
name: nginx
image: nginx-1.0
labels:
live: null # set the value of the map key to null
```
Idempotency
-----------
All compatible Kubernetes APIs MUST support "name idempotency" and respond with an HTTP status code 409 when a request is made to POST an object that has the same name as an existing object in the system. See [identifiers.md](identifiers.md) for details.
Names generated by the system may be requested using `metadata.generateName`. GenerateName indicates that the name should be made unique by the server prior to persisting it. A non-empty value for the field indicates the name will be made unique (and the name returned to the client will be different than the name passed). The value of this field will be combined with a unique suffix on the server if the Name field has not been provided. The provided value must be valid within the rules for Name, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified, and Name is not present, the server will NOT return a 409 if the generated name exists - instead, it will either return 201 Created or 504 with Reason `ServerTimeout` indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header).
Defaulting
----------
Default resource values are API version-specific, and they are applied during
the conversion from API-versioned declarative configuration to internal objects
representing the desired state (`Spec`) of the resource. Subsequent GETs of the
resource will include the default values explicitly.
Incorporating the default values into the `Spec` ensures that `Spec` depicts the
full desired state so that it is easier for the system to determine how to
achieve the state, and for the user to know what to anticipate.
API version-specific default values are set by the API server.
Late Initialization
-------------------
Late initialization is when resource fields are set by a system controller
after an object is created/updated.
For example, the scheduler sets the pod.spec.nodeName field after the pod is created.
Late-initializers should only make the following types of modifications:
- Setting previously unset fields
- Adding keys to maps
- Adding values to arrays which have mergeable semantics (`patchStrategy:"merge"` attribute in
go definition of type).
These conventions:
1. allow a user (with sufficient privilege) to override any system-default behaviors by setting
the fields that would otherwise have been defaulted.
1. enables updates from users to be merged with changes made during late initialization, using
strategic merge patch, as opposed to clobbering the change.
1. allow the component which does the late-initialization to use strategic merge patch, which
facilitates composition and concurrency of such components.
Although the apiserver Admission Control stage acts prior to object creation,
Admission Control plugins should follow the Late Initialization conventions
too, to allow their implementation to be later moved to a controller, or to client libraries.
Concurrency Control and Consistency
-----------------------------------
Kubernetes leverages the concept of *resource versions* to achieve optimistic concurrency. All Kubernetes resources have a "resourceVersion" field as part of their metadata. This resourceVersion is a string that identifies the internal version of an object that can be used by clients to determine when objects have changed. When a record is about to be updated, it's version is checked against a pre-saved value, and if it doesn't match, the update fails with a StatusConflict (HTTP status code 409).
The resourceVersion is changed by the server every time an object is modified. If resourceVersion is included with the PUT operation the system will verify that there have not been other successful mutations to the resource during a read/modify/write cycle, by verifying that the current value of resourceVersion matches the specified value.
The resourceVersion is currently backed by [etcd's modifiedIndex](https://coreos.com/docs/distributed-configuration/etcd-api/). However, it's important to note that the application should *not* rely on the implementation details of the versioning system maintained by Kubernetes. We may change the implementation of resourceVersion in the future, such as to change it to a timestamp or per-object counter.
The only way for a client to know the expected value of resourceVersion is to have received it from the server in response to a prior operation, typically a GET. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. Currently, the value of resourceVersion is set to match etcd's sequencer. You could think of it as a logical clock the API server can use to order requests. However, we expect the implementation of resourceVersion to change in the future, such as in the case we shard the state by kind and/or namespace, or port to another storage system.
In the case of a conflict, the correct client action at this point is to GET the resource again, apply the changes afresh, and try submitting again. This mechanism can be used to prevent races like the following:
```
Client #1 Client #2
GET Foo GET Foo
Set Foo.Bar = "one" Set Foo.Baz = "two"
PUT Foo PUT Foo
```
When these sequences occur in parallel, either the change to Foo.Bar or the change to Foo.Baz can be lost.
On the other hand, when specifying the resourceVersion, one of the PUTs will fail, since whichever write succeeds changes the resourceVersion for Foo.
resourceVersion may be used as a precondition for other operations (e.g., GET, DELETE) in the future, such as for read-after-write consistency in the presence of caching.
"Watch" operations specify resourceVersion using a query parameter. It is used to specify the point at which to begin watching the specified resources. This may be used to ensure that no mutations are missed between a GET of a resource (or list of resources) and a subsequent Watch, even if the current version of the resource is more recent. This is currently the main reason that list operations (GET on a collection) return resourceVersion.
Serialization Format
--------------------
APIs may return alternative representations of any resource in response to an Accept header or under alternative endpoints, but the default serialization for input and output of API responses MUST be JSON.
All dates should be serialized as RFC3339 strings.
Units
-----
Units must either be explicit in the field name (e.g., `timeoutSeconds`), or must be specified as part of the value (e.g., `resource.Quantity`). Which approach is preferred is TBD.
Selecting Fields
----------------
Some APIs may need to identify which field in a JSON object is invalid, or to reference a value to extract from a separate resource. The current recommendation is to use standard JavaScript syntax for accessing that field, assuming the JSON object was transformed into a JavaScript object.
Examples:
* Find the field "current" in the object "state" in the second item in the array "fields": `fields[0].state.current`
TODO: Plugins, extensions, nested kinds, headers
HTTP Status codes
-----------------
The server will respond with HTTP status codes that match the HTTP spec. See the section below for a breakdown of the types of status codes the server will send.
The following HTTP status codes may be returned by the API.
#### Success codes
* `200 StatusOK`
* Indicates that the request completed successfully.
* `201 StatusCreated`
* Indicates that the request to create kind completed successfully.
* `204 StatusNoContent`
* Indicates that the request completed successfully, and the response contains no body.
* Returned in response to HTTP OPTIONS requests.
#### Error codes
* `307 StatusTemporaryRedirect`
* Indicates that the address for the requested resource has changed.
* Suggested client recovery behavior
* Follow the redirect.
* `400 StatusBadRequest`
* Indicates the requested is invalid.
* Suggested client recovery behavior:
* Do not retry. Fix the request.
* `401 StatusUnauthorized`
* Indicates that the server can be reached and understood the request, but refuses to take any further action, because the client must provide authorization. If the client has provided authorization, the server is indicating the provided authorization is unsuitable or invalid.
* Suggested client recovery behavior
* If the user has not supplied authorization information, prompt them for the appropriate credentials
* If the user has supplied authorization information, inform them their credentials were rejected and optionally prompt them again.
* `403 StatusForbidden`
* Indicates that the server can be reached and understood the request, but refuses to take any further action, because it is configured to deny access for some reason to the requested resource by the client.
* Suggested client recovery behavior
* Do not retry. Fix the request.
* `404 StatusNotFound`
* Indicates that the requested resource does not exist.
* Suggested client recovery behavior
* Do not retry. Fix the request.
* `405 StatusMethodNotAllowed`
* Indicates that that the action the client attempted to perform on the resource was not supported by the code.
* Suggested client recovery behavior
* Do not retry. Fix the request.
* `409 StatusConflict`
* Indicates that either the resource the client attempted to create already exists or the requested update operation cannot be completed due to a conflict.
* Suggested client recovery behavior
* * If creating a new resource
* * Either change the identifier and try again, or GET and compare the fields in the pre-existing object and issue a PUT/update to modify the existing object.
* * If updating an existing resource:
* See `Conflict` from the `status` response section below on how to retrieve more information about the nature of the conflict.
* GET and compare the fields in the pre-existing object, merge changes (if still valid according to preconditions), and retry with the updated request (including `ResourceVersion`).
* `422 StatusUnprocessableEntity`
* Indicates that the requested create or update operation cannot be completed due to invalid data provided as part of the request.
* Suggested client recovery behavior
* Do not retry. Fix the request.
* `429 StatusTooManyRequests`
* Indicates that the either the client rate limit has been exceeded or the server has received more requests then it can process.
* Suggested client recovery behavior:
* Read the ```Retry-After``` HTTP header from the response, and wait at least that long before retrying.
* `500 StatusInternalServerError`
* Indicates that the server can be reached and understood the request, but either an unexpected internal error occurred and the outcome of the call is unknown, or the server cannot complete the action in a reasonable time (this maybe due to temporary server load or a transient communication issue with another server).
* Suggested client recovery behavior:
* Retry with exponential backoff.
* `503 StatusServiceUnavailable`
* Indicates that required service is unavailable.
* Suggested client recovery behavior:
* Retry with exponential backoff.
* `504 StatusServerTimeout`
* Indicates that the request could not be completed within the given time. Clients can get this response ONLY when they specified a timeout param in the request.
* Suggested client recovery behavior:
* Increase the value of the timeout param and retry with exponential backoff
Response Status Kind
--------------------
Kubernetes will always return the ```Status``` kind from any API endpoint when an error occurs.
Clients SHOULD handle these types of objects when appropriate.
A ```Status``` kind will be returned by the API in two cases:
* When an operation is not successful (i.e. when the server would return a non 2xx HTTP status code).
* When a HTTP ```DELETE``` call is successful.
The status object is encoded as JSON and provided as the body of the response. The status object contains fields for humans and machine consumers of the API to get more detailed information for the cause of the failure. The information in the status object supplements, but does not override, the HTTP status code's meaning. When fields in the status object have the same meaning as generally defined HTTP headers and that header is returned with the response, the header should be considered as having higher priority.
**Example:**
```
$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana
> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 10.240.122.184
> Accept: */*
> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< Date: Wed, 20 May 2015 18:10:42 GMT
< Content-Length: 232
<
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "pods \"grafana\" not found",
"reason": "NotFound",
"details": {
"name": "grafana",
"kind": "pods"
},
"code": 404
}
```
```status``` field contains one of two possible values:
* `Success`
* `Failure`
`message` may contain human-readable description of the error
```reason``` may contain a machine-readable description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it.
```details``` may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type.
Possible values for the ```reason``` and ```details``` fields:
* `BadRequest`
* Indicates that the request itself was invalid, because the request doesn't make any sense, for example deleting a read-only object.
* This is different than `status reason` `Invalid` above which indicates that the API call could possibly succeed, but the data was invalid.
* API calls that return BadRequest can never succeed.
* Http status code: `400 StatusBadRequest`
* `Unauthorized`
* Indicates that the server can be reached and understood the request, but refuses to take any further action without the client providing appropriate authorization. If the client has provided authorization, this error indicates the provided credentials are insufficient or invalid.
* Details (optional):
* `kind string`
* The kind attribute of the unauthorized resource (on some operations may differ from the requested resource).
* `name string`
* The identifier of the unauthorized resource.
* HTTP status code: `401 StatusUnauthorized`
* `Forbidden`
* Indicates that the server can be reached and understood the request, but refuses to take any further action, because it is configured to deny access for some reason to the requested resource by the client.
* Details (optional):
* `kind string`
* The kind attribute of the forbidden resource (on some operations may differ from the requested resource).
* `name string`
* The identifier of the forbidden resource.
* HTTP status code: `403 StatusForbidden`
* `NotFound`
* Indicates that one or more resources required for this operation could not be found.
* Details (optional):
* `kind string`
* The kind attribute of the missing resource (on some operations may differ from the requested resource).
* `name string`
* The identifier of the missing resource.
* HTTP status code: `404 StatusNotFound`
* `AlreadyExists`
* Indicates that the resource you are creating already exists.
* Details (optional):
* `kind string`
* The kind attribute of the conflicting resource.
* `name string`
* The identifier of the conflicting resource.
* HTTP status code: `409 StatusConflict`
* `Conflict`
* Indicates that the requested update operation cannot be completed due to a conflict. The client may need to alter the request. Each resource may define custom details that indicate the nature of the conflict.
* HTTP status code: `409 StatusConflict`
* `Invalid`
* Indicates that the requested create or update operation cannot be completed due to invalid data provided as part of the request.
* Details (optional):
* `kind string`
* the kind attribute of the invalid resource
* `name string`
* the identifier of the invalid resource
* `causes`
* One or more `StatusCause` entries indicating the data in the provided resource that was invalid. The `reason`, `message`, and `field` attributes will be set.
* HTTP status code: `422 StatusUnprocessableEntity`
* `Timeout`
* Indicates that the request could not be completed within the given time. Clients may receive this response if the server has decided to rate limit the client, or if the server is overloaded and cannot process the request at this time.
* Http status code: `429 TooManyRequests`
* The server should set the `Retry-After` HTTP header and return `retryAfterSeconds` in the details field of the object. A value of `0` is the default.
* `ServerTimeout`
* Indicates that the server can be reached and understood the request, but cannot complete the action in a reasonable time. This maybe due to temporary server load or a transient communication issue with another server.
* Details (optional):
* `kind string`
* The kind attribute of the resource being acted on.
* `name string`
* The operation that is being attempted.
* The server should set the `Retry-After` HTTP header and return `retryAfterSeconds` in the details field of the object. A value of `0` is the default.
* Http status code: `504 StatusServerTimeout`
* `MethodNotAllowed`
* Indicates that that the action the client attempted to perform on the resource was not supported by the code.
* For instance, attempting to delete a resource that can only be created.
* API calls that return MethodNotAllowed can never succeed.
* Http status code: `405 StatusMethodNotAllowed`
* `InternalError`
* Indicates that an internal error occurred, it is unexpected and the outcome of the call is unknown.
* Details (optional):
* `causes`
* The original error.
* Http status code: `500 StatusInternalServerError`
`code` may contain the suggested HTTP return code for this status.
Events
------
TODO: Document events (refer to another doc for details)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/api-conventions.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/api-conventions.md?pixel)]()

View File

@ -0,0 +1,74 @@
# The Kubernetes API
Primary system and API concepts are documented in the [User guide](user-guide.md).
Overall API conventions are described in the [API conventions doc](api-conventions.md).
Complete API details are documented via [Swagger](http://swagger.io/). The Kubernetes apiserver (aka "master") exports an API that can be used to retrieve the [Swagger spec](https://github.com/swagger-api/swagger-spec/tree/master/schemas/v1.2) for the Kubernetes API, by default at `/swaggerapi`, and a UI you can use to browse the API documentation at `/swagger-ui`. We also periodically update a [statically generated UI](http://kubernetes.io/third_party/swagger-ui/).
Remote access to the API is discussed in the [access doc](accessing_the_api.md).
The Kubernetes API also serves as the foundation for the declarative configuration schema for the system. The [Kubectl](kubectl.md) command-line tool can be used to create, update, delete, and get API objects.
Kubernetes also stores its serialized state (currently in [etcd](https://coreos.com/docs/distributed-configuration/getting-started-with-etcd/)) in terms of the API resources.
Kubernetes itself is decomposed into multiple components, which interact through its API.
## API changes
In our experience, any system that is successful needs to grow and change as new use cases emerge or existing ones change. Therefore, we expect the Kubernetes API to continuously change and grow. However, we intend to not break compatibility with existing clients, for an extended period of time. In general, new API resources and new resource fields can be expected to be added frequently. Elimination of resources or fields will require following a deprecation process. The precise deprecation policy for eliminating features is TBD, but once we reach our 1.0 milestone, there will be a specific policy.
What constitutes a compatible change and how to change the API are detailed by the [API change document](devel/api_changes.md).
## API versioning
Fine-grain resource evolution alone makes it difficult to eliminate fields or restructure resource representations. Therefore, Kubernetes supports multiple API versions, each at a different API path prefix, such as `/api/v1beta3`. These are simply different interfaces to read and/or modify the same underlying resources. In general, all API resources are accessible via all API versions, though there may be some cases in the future where that is not true.
Distinct API versions present more clear, consistent views of system resources and behavior than intermingled, independently evolved resources. They also provide a more straightforward mechanism for controlling access to end-of-lifed and/or experimental APIs.
The [API and release versioning proposal](versioning.md) describes the current thinking on the API version evolution process.
## v1beta1, v1beta2, and v1beta3 are deprecated; please move to v1 ASAP
As of June 4, 2015, the Kubernetes v1 API has been enabled by default. The v1beta1 and v1beta2 APIs were deleted on June 1, 2015. v1beta3 is planned to be deleted on July 6, 2015.
### v1 conversion tips (from v1beta3)
We're working to convert all documentation and examples to v1. A simple [API conversion tool](cluster_management.md#switching-your-config-files-to-a-new-api-version) has been written to simplify the translation process. Use `kubectl create --validate` in order to validate your json or yaml against our Swagger spec.
Changes to services are the most significant difference between v1beta3 and v1.
* The `service.spec.portalIP` property is renamed to `service.spec.clusterIP`.
* The `service.spec.createExternalLoadBalancer` property is removed. Specify `service.spec.type: "LoadBalancer"` to create an external load balancer instead.
* The `service.spec.publicIPs` property is deprecated and now called `service.spec.deprecatedPublicIPs`. This property will be removed entirely when v1beta3 is removed. The vast majority of users of this field were using it to expose services on ports on the node. Those users should specify `service.spec.type: "NodePort"` instead. Read [External Services](services.md#external-services) for more info. If this is not sufficient for your use case, please file an issue or contact @thockin.
Some other difference between v1beta3 and v1:
* The `pod.spec.containers[*].privileged` and `pod.spec.containers[*].capabilities` properties are now nested under the `pod.spec.containers[*].securityContext` property. See [Security Contexts](security_context.md).
* The `pod.spec.host` property is renamed to `pod.spec.nodeName`.
* The `endpoints.subsets[*].addresses.IP` property is renamed to `endpoints.subsets[*].addresses.ip`.
* The `pod.status.containerStatuses[*].state.termination` and `pod.status.containerStatuses[*].lastState.termination` properties are renamed to `pod.status.containerStatuses[*].state.terminated` and `pod.status.containerStatuses[*].state.terminated` respectively.
* The `pod.status.Condition` property is renamed to `pod.status.conditions`.
* The `status.details.id` property is renamed to `status.details.name`.
### v1beta3 conversion tips (from v1beta1/2)
Some important differences between v1beta1/2 and v1beta3:
* The resource `id` is now called `name`.
* `name`, `labels`, `annotations`, and other metadata are now nested in a map called `metadata`
* `desiredState` is now called `spec`, and `currentState` is now called `status`
* `/minions` has been moved to `/nodes`, and the resource has kind `Node`
* The namespace is required (for all namespaced resources) and has moved from a URL parameter to the path: `/api/v1beta3/namespaces/{namespace}/{resource_collection}/{resource_name}`. If you were not using a namespace before, use `default` here.
* The names of all resource collections are now lower cased - instead of `replicationControllers`, use `replicationcontrollers`.
* To watch for changes to a resource, open an HTTP or Websocket connection to the collection query and provide the `?watch=true` query parameter along with the desired `resourceVersion` parameter to watch from.
* The `labels` query parameter has been renamed to `label-selector`.
* The container `entrypoint` has been renamed to `command`, and `command` has been renamed to `args`.
* Container, volume, and node resources are expressed as nested maps (e.g., `resources{cpu:1}`) rather than as individual fields, and resource values support [scaling suffixes](resources.md#resource-quantities) rather than fixed scales (e.g., milli-cores).
* Restart policy is represented simply as a string (e.g., `"Always"`) rather than as a nested map (`always{}`).
* Pull policies changed from `PullAlways`, `PullNever`, and `PullIfNotPresent` to `Always`, `Never`, and `IfNotPresent`.
* The volume `source` is inlined into `volume` rather than nested.
* Host volumes have been changed from `hostDir` to `hostPath` to better reflect that they can be files or directories.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/api.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/api.md?pixel)]()

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 217 KiB

View File

@ -0,0 +1,499 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
<svg width="68cm" height="56cm" viewBox="-55 -75 1348 1117" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g>
<rect style="fill: #ffffff" x="662" y="192" width="630" height="381"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="662" y="192" width="630" height="381"/>
</g>
<g>
<rect style="fill: #ffffff" x="688" y="321" width="580" height="227"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="688" y="321" width="580" height="227"/>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="687" y="224">
<tspan x="687" y="224">Node</tspan>
</text>
<g>
<rect style="fill: #ffffff" x="723.2" y="235" width="69.6" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="723.2" y="235" width="69.6" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="758" y="257.9">
<tspan x="758" y="257.9">kubelet</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="720.2" y="368.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="720.2" y="368.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="760.55" y="438.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="760.55" y="438.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="805.2" y="461">
<tspan x="805.2" y="461">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="749.8" y="428.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="749.8" y="428.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="794.45" y="451.1">
<tspan x="794.45" y="451.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="739.4" y="418.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="739.4" y="418.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="784.05" y="441.2">
<tspan x="784.05" y="441.2">cAdvisor</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="794.2" y="434.6">
<tspan x="794.2" y="434.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="742.2" y="394.6">
<tspan x="742.2" y="394.6">Pod</tspan>
</text>
<g>
<g>
<rect style="fill: #ffffff" x="1108.6" y="368.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1108.6" y="368.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="1148.95" y="438.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1148.95" y="438.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1193.6" y="461">
<tspan x="1193.6" y="461">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="1138.2" y="428.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1138.2" y="428.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1182.85" y="451.1">
<tspan x="1182.85" y="451.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="1127.8" y="418.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1127.8" y="418.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1172.45" y="441.2">
<tspan x="1172.45" y="441.2">container</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1182.6" y="434.6">
<tspan x="1182.6" y="434.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1130.6" y="394.6">
<tspan x="1130.6" y="394.6">Pod</tspan>
</text>
</g>
<g>
<g>
<rect style="fill: #ffffff" x="902.9" y="368.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="902.9" y="368.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="943.25" y="438.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="943.25" y="438.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="987.9" y="461">
<tspan x="987.9" y="461">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="932.5" y="428.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="932.5" y="428.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="977.15" y="451.1">
<tspan x="977.15" y="451.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="922.1" y="418.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="922.1" y="418.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="966.75" y="441.2">
<tspan x="966.75" y="441.2">container</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="976.9" y="434.6">
<tspan x="976.9" y="434.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="924.9" y="394.6">
<tspan x="924.9" y="394.6">Pod</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="949.748" y="228" width="57.1" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="949.748" y="228" width="57.1" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="978.298" y="250.9">
<tspan x="978.298" y="250.9">Proxy</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="126.911" y="92.49" width="189.4" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="126.911" y="92.49" width="189.4" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="221.611" y="115.39">
<tspan x="221.611" y="115.39">kubectl (user commands)</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="142.476" y="866.282">
<tspan x="142.476" y="866.282"></tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="758" y1="273" x2="782.332" y2="408.717"/>
<polygon style="fill: #000000" points="783.655,416.099 776.969,407.138 782.332,408.717 786.812,405.374 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="783.655,416.099 776.969,407.138 782.332,408.717 786.812,405.374 "/>
</g>
<g>
<rect style="fill: #ffffff" x="941.576" y="75.6768" width="70.2" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="941.576" y="75.6768" width="70.2" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="976.676" y="98.5768">
<tspan x="976.676" y="98.5768">Firewall</tspan>
</text>
</g>
<g>
<path style="fill: #ffffff" d="M 948.242 -47.953 C 938.87,-48.2618 920.694,-41.7773 923.25,-27.8819 C 925.806,-13.9865 938.018,-10.8988 943.13,-14.9129 C 948.242,-18.9271 935.178,4.54051 960.17,10.7162 C 985.161,16.8919 997.941,7.01079 994.249,-0.0912821 C 990.557,-7.19336 1016.12,16.5832 1028.04,2.99658 C 1039.97,-10.59 1015.83,-23.5589 1020.94,-21.7062 C 1026.06,-19.8535 1041.68,-22.3237 1036.56,-45.4827 C 1031.45,-68.6416 985.445,-50.7321 990.557,-54.1287 C 995.669,-57.5253 982.889,-74.5086 966.986,-71.112 C 951.082,-67.7153 949.954,-61.5516 948.25,-47.965 L 948.242,-47.953z"/>
<path style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" d="M 948.242 -47.953 C 938.87,-48.2618 920.694,-41.7773 923.25,-27.8819 C 925.806,-13.9865 938.018,-10.8988 943.13,-14.9129 C 948.242,-18.9271 935.178,4.54051 960.17,10.7162 C 985.161,16.8919 997.941,7.01079 994.249,-0.0912821 C 990.557,-7.19336 1016.12,16.5832 1028.04,2.99658 C 1039.97,-10.59 1015.83,-23.5589 1020.94,-21.7062 C 1026.06,-19.8535 1041.68,-22.3237 1036.56,-45.4827 C 1031.45,-68.6416 985.445,-50.7321 990.557,-54.1287 C 995.669,-57.5253 982.889,-74.5086 966.986,-71.112 C 951.082,-67.7153 949.954,-61.5516 948.25,-47.965 L 948.242,-47.953"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="984.428" y="-23.1971">
<tspan x="984.428" y="-23.1971">Internet</tspan>
</text>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="974.985" y1="12.703" x2="976.415" y2="65.9442"/>
<polygon style="fill: #000000" points="976.616,73.4415 971.349,63.5793 976.415,65.9442 981.346,63.3109 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="976.616,73.4415 971.349,63.5793 976.415,65.9442 981.346,63.3109 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="976.676" y1="113.677" x2="978.16" y2="218.265"/>
<polygon style="fill: #000000" points="978.266,225.764 973.125,215.836 978.16,218.265 983.124,215.694 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="978.266,225.764 973.125,215.836 978.16,218.265 983.124,215.694 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="978.298" y1="266" x2="977.033" y2="358.365"/>
<polygon style="fill: #000000" points="976.931,365.864 972.068,355.797 977.033,358.365 982.067,355.934 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="976.931,365.864 972.068,355.797 977.033,358.365 982.067,355.934 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="992.573" y1="266" x2="1174.02" y2="363.492"/>
<polygon style="fill: #000000" points="1180.63,367.042 1169.45,366.713 1174.02,363.492 1174.19,357.904 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="1180.63,367.042 1169.45,366.713 1174.02,363.492 1174.19,357.904 "/>
</g>
<g>
<rect style="fill: #ffffff" x="-54" y="370.5" width="562" height="383.25"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="-54" y="370.5" width="562" height="383.25"/>
</g>
<g>
<rect style="fill: #ffffff" x="-30" y="416.75" width="364" height="146"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="-30" y="416.75" width="364" height="146"/>
</g>
<g>
<rect style="fill: #ffffff" x="201.314" y="594.318" width="154.6" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="201.314" y="594.318" width="154.6" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="278.614" y="617.218">
<tspan x="278.614" y="617.218">replication controller</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="94.8884" y="617.914" width="86.15" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="94.8884" y="617.914" width="86.15" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="137.963" y="640.814">
<tspan x="137.963" y="640.814">Scheduler</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="80.162" y="594.318" width="86.15" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="80.162" y="594.318" width="86.15" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="123.237" y="617.218">
<tspan x="123.237" y="617.218">Scheduler</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="-34.876" y="699.256">
<tspan x="-34.876" y="699.256">Master components</tspan>
<tspan x="-34.876" y="715.256">Colocated, or spread across machines,</tspan>
<tspan x="-34.876" y="731.256">as dictated by cluster size.</tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="506.888" y="611.5">
<tspan x="506.888" y="611.5"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="506.888" y="611.5">
<tspan x="506.888" y="611.5"></tspan>
</text>
<g>
<rect style="fill: #ffffff" x="136.717" y="468.5" width="172.175" height="70"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="136.717" y="468.5" width="172.175" height="70"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="222.804" y="491.4">
<tspan x="222.804" y="491.4">REST</tspan>
<tspan x="222.804" y="507.4">(pods, services,</tspan>
<tspan x="222.804" y="523.4">rep. controllers)</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="165.958" y="389.5" width="115" height="54"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="165.958" y="389.5" width="115" height="54"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="223.458" y="412.4">
<tspan x="223.458" y="412.4">authorization</tspan>
<tspan x="223.458" y="428.4">authentication</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="2.35" y="476.5" width="91.3" height="54"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="2.35" y="476.5" width="91.3" height="54"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="48" y="499.4">
<tspan x="48" y="499.4">scheduling</tspan>
<tspan x="48" y="515.4">actuator</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="-13" y="436.75">
<tspan x="-13" y="436.75">APIs</tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="123.237" y1="594.318" x2="55.4248" y2="536.798"/>
<polygon style="fill: #000000" points="49.7052,531.946 60.5656,534.602 55.4248,536.798 54.097,542.228 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="49.7052,531.946 60.5656,534.602 55.4248,536.798 54.097,542.228 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="123.237" y1="594.318" x2="172.833" y2="545.341"/>
<polygon style="fill: #000000" points="178.169,540.071 174.567,550.655 172.833,545.341 167.541,543.54 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="178.169,540.071 174.567,550.655 172.833,545.341 167.541,543.54 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="278.614" y1="594.318" x2="229.688" y2="545.385"/>
<polygon style="fill: #000000" points="224.385,540.081 234.991,543.618 229.688,545.385 227.92,550.688 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="224.385,540.081 234.991,543.618 229.688,545.385 227.92,550.688 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="223.458" y1="443.5" x2="223.059" y2="458.767"/>
<polygon style="fill: #000000" points="222.862,466.265 218.126,456.137 223.059,458.767 228.122,456.399 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="222.862,466.265 218.126,456.137 223.059,458.767 228.122,456.399 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="318.054" y1="544.587" x2="410.664" y2="606.112"/>
<polygon style="fill: #000000" points="320.821,540.422 309.725,539.053 315.288,548.752 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="320.821,540.422 309.725,539.053 315.288,548.752 "/>
<polygon style="fill: #000000" points="416.911,610.263 405.815,608.894 410.664,606.112 411.349,600.564 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="416.911,610.263 405.815,608.894 410.664,606.112 411.349,600.564 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="221.612" y1="130.49" x2="223.389" y2="379.764"/>
<polygon style="fill: #000000" points="223.442,387.264 218.371,377.3 223.389,379.764 228.371,377.229 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="223.442,387.264 218.371,377.3 223.389,379.764 228.371,377.229 "/>
</g>
<g>
<path style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" d="M 319.892 503.5 C 392.964,503.5 639.13,244.5 713.464,244.5"/>
<polygon style="fill: #000000" points="319.892,498.5 309.892,503.5 319.892,508.5 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="319.892,498.5 309.892,503.5 319.892,508.5 "/>
<polygon style="fill: #000000" points="720.964,244.5 710.964,249.5 713.464,244.5 710.964,239.5 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="720.964,244.5 710.964,249.5 713.464,244.5 710.964,239.5 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="93.65" y1="503.5" x2="126.981" y2="503.5"/>
<polygon style="fill: #000000" points="134.481,503.5 124.481,508.5 126.981,503.5 124.481,498.5 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="134.481,503.5 124.481,508.5 126.981,503.5 124.481,498.5 "/>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="221.612" y="111.49">
<tspan x="221.612" y="111.49"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1209" y="339.5">
<tspan x="1209" y="339.5">docker</tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="793.753" y1="272.636" x2="968.266" y2="363.6"/>
<polygon style="fill: #000000" points="974.917,367.066 963.738,366.878 968.266,363.6 968.361,358.01 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="974.917,367.066 963.738,366.878 968.266,363.6 968.361,358.01 "/>
</g>
<text font-size="12.7998" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="978" y="434.5">
<tspan x="978" y="434.5">..</tspan>
</text>
<text font-size="27.0929" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1067" y="437">
<tspan x="1067" y="437">...</tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="792.8" y1="273" x2="1173.14" y2="365.792"/>
<polygon style="fill: #000000" points="1180.43,367.57 1169.53,370.057 1173.14,365.792 1171.9,360.342 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="1180.43,367.57 1169.53,370.057 1173.14,365.792 1171.9,360.342 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="792.8" y1="273" x2="794.057" y2="358.365"/>
<polygon style="fill: #000000" points="794.167,365.864 789.02,355.939 794.057,358.365 799.019,355.792 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="794.167,365.864 789.02,355.939 794.057,358.365 799.019,355.792 "/>
</g>
<text font-size="12.7998" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="720" y="220">
<tspan x="720" y="220"></tspan>
</text>
<g>
<rect style="fill: #ffffff" x="660" y="660" width="630" height="381"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="660" y="660" width="630" height="381"/>
</g>
<g>
<rect style="fill: #ffffff" x="686" y="789" width="580" height="227"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="686" y="789" width="580" height="227"/>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="685" y="692">
<tspan x="685" y="692">Node</tspan>
</text>
<g>
<rect style="fill: #ffffff" x="721.2" y="703" width="69.6" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="721.2" y="703" width="69.6" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="756" y="725.9">
<tspan x="756" y="725.9">kubelet</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="718.2" y="836.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="718.2" y="836.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="758.55" y="906.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="758.55" y="906.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="803.2" y="929">
<tspan x="803.2" y="929">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="747.8" y="896.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="747.8" y="896.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="792.45" y="919.1">
<tspan x="792.45" y="919.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="737.4" y="886.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="737.4" y="886.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="782.05" y="909.2">
<tspan x="782.05" y="909.2">cAdvisor</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="792.2" y="902.6">
<tspan x="792.2" y="902.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="740.2" y="862.6">
<tspan x="740.2" y="862.6">Pod</tspan>
</text>
<g>
<g>
<rect style="fill: #ffffff" x="1106.6" y="836.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1106.6" y="836.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="1146.95" y="906.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1146.95" y="906.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1191.6" y="929">
<tspan x="1191.6" y="929">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="1136.2" y="896.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1136.2" y="896.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1180.85" y="919.1">
<tspan x="1180.85" y="919.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="1125.8" y="886.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="1125.8" y="886.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="1170.45" y="909.2">
<tspan x="1170.45" y="909.2">container</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1180.6" y="902.6">
<tspan x="1180.6" y="902.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1128.6" y="862.6">
<tspan x="1128.6" y="862.6">Pod</tspan>
</text>
</g>
<g>
<g>
<rect style="fill: #ffffff" x="900.9" y="836.1" width="148" height="133"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="900.9" y="836.1" width="148" height="133"/>
</g>
<g>
<rect style="fill: #ffffff" x="941.25" y="906.1" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="941.25" y="906.1" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="985.9" y="929">
<tspan x="985.9" y="929">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="930.5" y="896.2" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="930.5" y="896.2" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="975.15" y="919.1">
<tspan x="975.15" y="919.1">container</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="920.1" y="886.3" width="89.3" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="920.1" y="886.3" width="89.3" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="964.75" y="909.2">
<tspan x="964.75" y="909.2">container</tspan>
</text>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="974.9" y="902.6">
<tspan x="974.9" y="902.6"></tspan>
</text>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="922.9" y="862.6">
<tspan x="922.9" y="862.6">Pod</tspan>
</text>
</g>
<g>
<rect style="fill: #ffffff" x="947.748" y="696" width="57.1" height="38"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="947.748" y="696" width="57.1" height="38"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="976.298" y="718.9">
<tspan x="976.298" y="718.9">Proxy</tspan>
</text>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="756" y1="741" x2="780.332" y2="876.717"/>
<polygon style="fill: #000000" points="781.655,884.099 774.969,875.138 780.332,876.717 784.812,873.374 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="781.655,884.099 774.969,875.138 780.332,876.717 784.812,873.374 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="976.298" y1="734" x2="975.033" y2="826.365"/>
<polygon style="fill: #000000" points="974.931,833.864 970.068,823.797 975.033,826.365 980.067,823.934 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="974.931,833.864 970.068,823.797 975.033,826.365 980.067,823.934 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="990.573" y1="734" x2="1172.02" y2="831.492"/>
<polygon style="fill: #000000" points="1178.63,835.042 1167.45,834.713 1172.02,831.492 1172.19,825.904 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="1178.63,835.042 1167.45,834.713 1172.02,831.492 1172.19,825.904 "/>
</g>
<text font-size="12.8" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1207" y="807.5">
<tspan x="1207" y="807.5">docker</tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="791.753" y1="740.636" x2="966.266" y2="831.6"/>
<polygon style="fill: #000000" points="972.917,835.066 961.738,834.878 966.266,831.6 966.361,826.01 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="972.917,835.066 961.738,834.878 966.266,831.6 966.361,826.01 "/>
</g>
<text font-size="12.7998" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="976" y="902.5">
<tspan x="976" y="902.5">..</tspan>
</text>
<text font-size="27.0929" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="1065" y="905">
<tspan x="1065" y="905">...</tspan>
</text>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="790.8" y1="741" x2="1171.14" y2="833.792"/>
<polygon style="fill: #000000" points="1178.43,835.57 1167.53,838.057 1171.14,833.792 1169.9,828.342 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="1178.43,835.57 1167.53,838.057 1171.14,833.792 1169.9,828.342 "/>
</g>
<g>
<line style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x1="790.8" y1="741" x2="792.057" y2="826.365"/>
<polygon style="fill: #000000" points="792.167,833.864 787.02,823.939 792.057,826.365 797.019,823.792 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="792.167,833.864 787.02,823.939 792.057,826.365 797.019,823.792 "/>
</g>
<text font-size="12.7998" style="fill: #000000;text-anchor:start;font-family:sans-serif;font-style:normal;font-weight:normal" x="718" y="688">
<tspan x="718" y="688"></tspan>
</text>
<g>
<path style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" d="M 319.892 503.5 C 392.964,503.5 575.93,850.5 650.264,850.5"/>
<polygon style="fill: #000000" points="319.892,498.5 309.892,503.5 319.892,508.5 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="319.892,498.5 309.892,503.5 319.892,508.5 "/>
<polygon style="fill: #000000" points="657.764,850.5 647.764,855.5 650.264,850.5 647.764,845.5 "/>
<polygon style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="657.764,850.5 647.764,855.5 650.264,850.5 647.764,845.5 "/>
</g>
<g>
<rect style="fill: #ffffff" x="418.774" y="551" width="176.225" height="121"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="418.774" y="551" width="176.225" height="121"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="506.886" y="583.4">
<tspan x="506.886" y="583.4">Distributed</tspan>
<tspan x="506.886" y="599.4">Watchable</tspan>
<tspan x="506.886" y="615.4">Storage</tspan>
<tspan x="506.886" y="631.4"></tspan>
<tspan x="506.886" y="647.4">(implemented via etcd)</tspan>
</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 34 KiB

View File

@ -0,0 +1,46 @@
# Authentication Plugins
Kubernetes uses client certificates, tokens, or http basic auth to authenticate users for API calls.
Client certificate authentication is enabled by passing the `--client_ca_file=SOMEFILE`
option to apiserver. The referenced file must contain one or more certificates authorities
to use to validate client certificates presented to the apiserver. If a client certificate
is presented and verified, the common name of the subject is used as the user name for the
request.
Token authentication is enabled by passing the `--token_auth_file=SOMEFILE` option
to apiserver. Currently, tokens last indefinitely, and the token list cannot
be changed without restarting apiserver. We plan in the future for tokens to
be short-lived, and to be generated as needed rather than stored in a file.
The token file format is implemented in `plugin/pkg/auth/authenticator/token/tokenfile/...`
and is a csv file with 3 columns: token, user name, user uid.
When using token authentication from an http client the apiserver expects an `Authorization`
header with a value of `Bearer SOMETOKEN`.
Basic authentication is enabled by passing the `--basic_auth_file=SOMEFILE`
option to apiserver. Currently, the basic auth credentials last indefinitely,
and the password cannot be changed without restarting apiserver. Note that basic
authentication is currently supported for convenience while we finish making the
more secure modes described above easier to use.
The basic auth file format is implemented in `plugin/pkg/auth/authenticator/password/passwordfile/...`
and is a csv file with 3 columns: password, user name, user id.
When using basic authentication from an http client the apiserver expects an `Authorization` header
with a value of `Basic BASE64ENCODEDUSER:PASSWORD`.
## Plugin Development
We plan for the Kubernetes API server to issue tokens
after the user has been (re)authenticated by a *bedrock* authentication
provider external to Kubernetes. We plan to make it easy to develop modules
that interface between kubernetes and a bedrock authentication provider (e.g.
github.com, google.com, enterprise directory, kerberos, etc.)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/authentication.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/authentication.md?pixel)]()

View File

@ -0,0 +1,109 @@
# Authorization Plugins
In Kubernetes, authorization happens as a separate step from authentication.
See the [authentication documentation](./authentication.md) for an
overview of authentication.
Authorization applies to all HTTP accesses on the main apiserver port. (The
readonly port is not currently subject to authorization, but is planned to be
removed soon.)
The authorization check for any request compares attributes of the context of
the request, (such as user, resource, and namespace) with access
policies. An API call must be allowed by some policy in order to proceed.
The following implementations are available, and are selected by flag:
- `--authorization_mode=AlwaysDeny`
- `--authorization_mode=AlwaysAllow`
- `--authorization_mode=ABAC`
`AlwaysDeny` blocks all requests (used in tests).
`AlwaysAllow` allows all requests; use if you don't need authorization.
`ABAC` allows for user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
## ABAC Mode
### Request Attributes
A request has 4 attributes that can be considered for authorization:
- user (the user-string which a user was authenticated as).
- whether the request is readonly (GETs are readonly)
- what resource is being accessed
- applies only to the API endpoints, such as
`/api/v1/namespaces/default/pods`. For miscellaneous endpoints, like `/version`, the
resource is the empty string.
- the namespace of the object being access, or the empty string if the
endpoint does not support namespaced objects.
We anticipate adding more attributes to allow finer grained access control and
to assist in policy management.
### Policy File Format
For mode `ABAC`, also specify `--authorization_policy_file=SOME_FILENAME`.
The file format is [one JSON object per line](http://jsonlines.org/). There should be no enclosing list or map, just
one map per line.
Each line is a "policy object". A policy object is a map with the following properties:
- `user`, type string; the user-string from `--token_auth_file`
- `readonly`, type boolean, when true, means that the policy only applies to GET
operations.
- `resource`, type string; a resource from an URL, such as `pods`.
- `namespace`, type string; a namespace string.
An unset property is the same as a property set to the zero value for its type (e.g. empty string, 0, false).
However, unset should be preferred for readability.
In the future, policies may be expressed in a JSON format, and managed via a REST
interface.
### Authorization Algorithm
A request has attributes which correspond to the properties of a policy object.
When a request is received, the attributes are determined. Unknown attributes
are set to the zero value of its type (e.g. empty string, 0, false).
An unset property will match any value of the corresponding
attribute. An unset attribute will match any value of the corresponding property.
The tuple of attributes is checked for a match against every policy in the policy file.
If at least one line matches the request attributes, then the request is authorized (but may fail later validation).
To permit any user to do something, write a policy with the user property unset.
To permit an action Policy with an unset namespace applies regardless of namespace.
### Examples
1. Alice can do anything: `{"user":"alice"}`
2. Kubelet can read any pods: `{"user":"kubelet", "resource": "pods", "readonly": true}`
3. Kubelet can read and write events: `{"user":"kubelet", "resource": "events"}`
4. Bob can just read pods in namespace "projectCaribou": `{"user":"bob", "resource": "pods", "readonly": true, "ns": "projectCaribou"}`
[Complete file example](../pkg/auth/authorizer/abac/example_policy_file.jsonl)
## Plugin Development
Other implementations can be developed fairly easily.
The APIserver calls the Authorizer interface:
```go
type Authorizer interface {
Authorize(a Attributes) error
}
```
to determine whether or not to allow each API action.
An authorization plugin is a module that implements this interface.
Authorization plugin code goes in `pkg/auth/authorization/$MODULENAME`.
An authorization module can be completely implemented in go, or can call out
to a remote authorization service. Authorization modules can implement
their own caching to reduce the cost of repeated authorization calls with the
same or similar arguments. Developers should then consider the interaction between
caching and revocation of permissions.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/authorization.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/authorization.md?pixel)]()

View File

@ -0,0 +1,136 @@
# Availability
This document collects advice on reasoning about and provisioning for high-availability when using Kubernetes clusters.
## Failure modes
This is an incomplete list of things that could go wrong, and how to deal with them.
Root causes:
- VM(s) shutdown
- network partition within cluster, or between cluster and users.
- crashes in Kubernetes software
- data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume).
- operator error misconfigures kubernetes software or application software.
Specific scenarios:
- Apiserver VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- Apiserver backing storage lost
- Results
- apiserver should fail to come up.
- kubelets will not be able to reach it but will continute to run the same pods and provide the same service proxying.
- manual recovery or recreation of apiserver state necessary before apiserver is restarted.
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have own persistent state
- Node (thing that runs kubelet and kube-proxy and pods) shutdown
- Results
- pods on that Node stop running
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results:
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc
Mitigations:
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs.
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes
- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd.
- Mitigates: Apiserver backing storage lost
- Action: Use Replicated APIserver feature (when complete: feature is planned but not implemented)
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Will tolerate one or more simultaneous apiserver failures.
- Mitigates: Apiserver backing storage lost
- Each apiserver has independent storage. Etcd will recover from loss of one member. Risk of total data loss greatly reduced.
- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of kubernetes software fault
- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: Multiple independent clusters (and avoid making risky changes to all clusters at once)
- Mitigates: Everything listed above.
## Choosing Multiple Kubernetes Clusters
You may want to set up multiple kubernetes clusters, both to
have clusters in different regions to be nearer to your users; and to tolerate failures and/or invasive maintenance.
### Scope of a single cluster
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
[zone](https://cloud.google.com/compute/docs/zones) or [availability
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
single-zone cluster.
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
It is okay to have multiple clusters per availability zone, though on balance we think fewer is better.
Reasons to prefer fewer clusters are:
- improved bin packing of Pods in some cases with more nodes in one cluster.
- reduced operational overhead (though the advantage is diminished as ops tooling and processes matures).
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
of overall cluster cost for medium to large clusters).
Reasons to have multiple clusters include:
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
below).
- test clusters to canary new Kubernetes releases or other cluster software.
### Selecting the right number of clusters
The selection of the number of kubernetes clusters may be a relatively static choice, only revisted occasionally.
By contrast, the number of nodes in a cluster and the number of pods in a service may be change frequently according to
load and growth.
To pick the number of clusters, first, decide which regions you need to be in to have adequete latency to all your end users, for services that will run
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
Call the number of regions to be in `R`.
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
then you need `R + U` clusters. If it is not (e.g you want to ensure low latency for all users in the event of a
cluster failure), then you need to have `R * U` clusters (`U` in each of `R` regions). In any case, try to put each cluster in a different zone.
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
you may need even more clusters. Our [roadmap](http://docs.k8s.io/roadmap.md)
calls for maximum 100 node clusters at v1.0 and maximum 1000 node clusters in the middle of 2015.
## Working with multiple clusters
When you have multiple clusters, you would typically create services with the same config in each cluster and put each of those
service instances behind a load balancer (AWS Elastic Load Balancer, GCE Forwarding Rule or HTTP Load Balancer), so that
failures of a single cluster are not visible to end users.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/availability.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/availability.md?pixel)]()

View File

@ -0,0 +1,84 @@
# Kubernetes CLI/Configuration Roadmap
See also issues with the following labels:
* [area/config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area%2Fconfig-deployment)
* [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2FCLI)
* [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2Fclient)
1. Create services before other objects, or at least before objects that depend upon them. Namespace-relative DNS mitigates this some, but most users are still using service environment variables. [#1768](https://github.com/GoogleCloudPlatform/kubernetes/issues/1768)
1. Finish rolling update [#1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353)
1. Friendly to auto-scaling [#2863](https://github.com/GoogleCloudPlatform/kubernetes/pull/2863#issuecomment-69701562)
1. Rollback (make rolling-update reversible, and complete an in-progress rolling update by taking 2 replication controller names rather than always taking a file)
1. Rollover (replace multiple replication controllers with one, such as to clean up an aborted partial rollout)
1. Write a ReplicationController generator to derive the new ReplicationController from an old one (e.g., `--image-version=newversion`, which would apply a name suffix, update a label value, and apply an image tag)
1. Use readiness [#620](https://github.com/GoogleCloudPlatform/kubernetes/issues/620)
1. Perhaps factor this in a way that it can be shared with [Openshifts deployment controller](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743)
1. Rolling update service as a plugin
1. Kind-based filtering on object streams -- only operate on the kinds of objects specified. This would make directory-based kubectl operations much more useful. Users should be able to instantiate the example applications using `kubectl create -f <example-dir> ...`
1. Improved pretty printing of endpoints, such as in the case that there are more than a few endpoints
1. Service address/port lookup command(s)
1. List supported resources
1. Swagger lookups [#3060](https://github.com/GoogleCloudPlatform/kubernetes/issues/3060)
1. --name, --name-suffix applied during creation and updates
1. --labels and opinionated label injection: --app=foo, --tier={fe,cache,be,db}, --uservice=redis, --env={dev,test,prod}, --stage={canary,final}, --track={hourly,daily,weekly}, --release=0.4.3c2. Exact ones TBD. We could allow arbitrary values -- the keys are important. The actual label keys would be (optionally?) namespaced with kubectl.kubernetes.io/, or perhaps the users namespace.
1. --annotations and opinionated annotation injection: --description, --revision
1. Imperative updates. We'll want to optionally make these safe(r) by supporting preconditions based on the current value and resourceVersion.
1. annotation updates similar to label updates
1. other custom commands for common imperative updates
1. more user-friendly (but still generic) on-command-line json for patch
1. We also want to support the following flavors of more general updates:
1. whichever we dont support:
1. safe update: update the full resource, guarded by resourceVersion precondition (and perhaps selected value-based preconditions)
1. forced update: update the full resource, blowing away the previous Spec without preconditions; delete and re-create if necessary
1. diff/dryrun: Compare new config with current Spec [#6284](https://github.com/GoogleCloudPlatform/kubernetes/issues/6284)
1. submit/apply/reconcile/ensure/merge: Merge user-provided fields with current Spec. Keep track of user-provided fields using an annotation -- see [#1702](https://github.com/GoogleCloudPlatform/kubernetes/issues/1702). Delete all objects with deployment-specific labels.
1. --dry-run for all commands
1. Support full label selection syntax, including support for namespaces.
1. Wait on conditions [#1899](https://github.com/GoogleCloudPlatform/kubernetes/issues/1899)
1. Make kubectl scriptable: make output and exit code behavior consistent and useful for wrapping in workflows and piping back into kubectl and/or xargs (e.g., dump full URLs?, distinguish permanent and retry-able failure, identify objects that should be retried)
1. Here's [an example](http://techoverflow.net/blog/2013/10/22/docker-remove-all-images-and-containers/) where multiple objects on the command line and an option to dump object names only (`-q`) would be useful in combination. [#5906](https://github.com/GoogleCloudPlatform/kubernetes/issues/5906)
1. Easy generation of clean configuration files from existing objects (including containers -- podex) -- remove readonly fields, status
1. Export from one namespace, import into another is an important use case
1. Derive objects from other objects
1. pod clone
1. rc from pod
1. --labels-from (services from pods or rcs)
1. Kind discovery (i.e., operate on objects of all kinds) [#5278](https://github.com/GoogleCloudPlatform/kubernetes/issues/5278)
1. A fairly general-purpose way to specify fields on the command line during creation and update, not just from a config file
1. Extensible API-based generator framework (i.e. invoke generators via an API/URL rather than building them into kubectl), so that complex client libraries dont need to be rewritten in multiple languages, and so that the abstractions are available through all interfaces: API, CLI, UI, logs, ... [#5280](https://github.com/GoogleCloudPlatform/kubernetes/issues/5280)
1. Need schema registry, and some way to invoke generator (e.g., using a container)
1. Convert run command to API-based generator
1. Transformation framework
1. More intelligent defaulting of fields (e.g., [#2643](https://github.com/GoogleCloudPlatform/kubernetes/issues/2643))
1. Update preconditions based on the values of arbitrary object fields.
1. Deployment manager compatibility on GCP: [#3685](https://github.com/GoogleCloudPlatform/kubernetes/issues/3685)
1. Describe multiple objects, multiple kinds of objects [#5905](https://github.com/GoogleCloudPlatform/kubernetes/issues/5905)
1. Support yaml document separator [#5840](https://github.com/GoogleCloudPlatform/kubernetes/issues/5840)
TODO:
* watch
* attach [#1521](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521)
* image/registry commands
* do any other server paths make sense? validate? generic curl functionality?
* template parameterization
* dynamic/runtime configuration
Server-side support:
1. Default selectors from labels [#1698](https://github.com/GoogleCloudPlatform/kubernetes/issues/1698#issuecomment-71048278)
1. Stop [#1535](https://github.com/GoogleCloudPlatform/kubernetes/issues/1535)
1. Deleted objects [#2789](https://github.com/GoogleCloudPlatform/kubernetes/issues/2789)
1. Clone [#170](https://github.com/GoogleCloudPlatform/kubernetes/issues/170)
1. Resize [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)
1. Useful /operations API: wait for finalization/reification
1. List supported resources [#2057](https://github.com/GoogleCloudPlatform/kubernetes/issues/2057)
1. Reverse label lookup [#1348](https://github.com/GoogleCloudPlatform/kubernetes/issues/1348)
1. Field selection [#1362](https://github.com/GoogleCloudPlatform/kubernetes/issues/1362)
1. Field filtering [#1459](https://github.com/GoogleCloudPlatform/kubernetes/issues/1459)
1. Operate on uids
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cli-roadmap.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/cli-roadmap.md?pixel)]()

View File

@ -0,0 +1,20 @@
## kubernetes API client libraries
### Supported
* [Go](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client)
### User Contributed
*Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team*
* [Java](https://github.com/fabric8io/fabric8/tree/master/components/kubernetes-api)
* [Ruby1](https://github.com/Ch00k/kuber)
* [Ruby2](https://github.com/abonas/kubeclient)
* [PHP](https://github.com/devstub/kubernetes-api-php-client)
* [Node.js](https://github.com/tenxcloud/node-kubernetes-client)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/client-libraries.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/client-libraries.md?pixel)]()

View File

@ -0,0 +1,80 @@
# Kubernetes Cluster Admin Guide
The cluster admin guide is for anyone creating or administering a Kubernetes cluster.
It assumes some familiarity with concepts in the [User Guide](user-guide.md).
## Planning a cluster
There are many different examples of how to setup a kubernetes cluster. Many of them are listed in this
[matrix](getting-started-guides/README.md). We call each of the combinations in this matrix a *distro*.
Before chosing a particular guide, here are some things to consider:
- Are you just looking to try out Kubernetes on your laptop, or build a high-availability many-node cluster? Both
models are supported, but some distros are better for one case or the other.
- Will you be using a hosted Kubernetes cluster, such as [GKE](https://cloud.google.com/container-engine), or setting
one up yourself?
- Will your cluster be on-premises, or in the cloud (IaaS)? Kubernetes does not directly support hybrid clusters. We
recommend setting up multiple clusters rather than spanning distant locations.
- Will you be running Kubernetes on "bare metal" or virtual machines? Kubernetes supports both, via different distros.
- Do you just want to run a cluster, or do you expect to do active development of kubernetes project code? If the
latter, it is better to pick a distro actively used by other developers. Some distros only use binary releases, but
offer is a greater variety of choices.
- Not all distros are maintained as actively. Prefer ones which are listed as tested on a more recent version of
Kubernetes.
- If you are configuring kubernetes on-premises, you will need to consider what [networking
model](networking.md) fits best.
- If you are designing for very [high-availability](availability.md), you may want multiple clusters in multiple zones.
## Setting up a cluster
Pick one of the Getting Started Guides from the [matrix](getting-started-guides/README.md) and follow it.
If none of the Getting Started Guides fits, you may want to pull ideas from several of the guides.
One option for custom networking is *OpenVSwitch GRE/VxLAN networking* ([ovs-networking.md](ovs-networking.md)), which
uses OpenVSwitch to set up networking between pods across
Kubernetes nodes.
If you are modifying an existing guide which uses Salt, this document explains [how Salt is used in the Kubernetes
project.](salt.md).
## Upgrading a cluster
[Upgrading a cluster](cluster_management.md).
## Managing nodes
[Managing nodes](node.md).
## Optional Cluster Services
* **DNS Integration with SkyDNS** ([dns.md](dns.md)):
Resolving a DNS name directly to a Kubernetes service.
* **Logging** with [Kibana](logging.md)
## Multi-tenant support
* **Namespaces** ([namespaces.md](namespaces.md)): Namespaces help different
projects, teams, or customers to share a kubernetes cluster.
* **Resource Quota** ([resource_quota_admin.md](resource_quota_admin.md))
## Security
* **Kubernetes Container Environment** ([container-environment.md](container-environment.md)):
Describes the environment for Kubelet managed containers on a Kubernetes
node.
* **Securing access to the API Server** [accessing the api]( accessing_the_api.md)
* **Authentication** [authentication]( authentication.md)
* **Authorization** [authorization]( authorization.md)
* **Admission Controllers** [admission_controllers]( admission_controllers.md)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cluster-admin-guide.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/cluster-admin-guide.md?pixel)]()

View File

@ -0,0 +1,65 @@
# Cluster Management
This doc is in progress.
## Upgrading a cluster
The `cluster/kube-push.sh` script will do a rudimentary update; it is a 1.0 roadmap item to have a robust live cluster update system.
## Updgrading to a different API version
There is a sequence of steps to upgrade to a new API version.
1. Turn on the new api version
2. Upgrade the cluster's storage to use the new version.
3. Upgrade all config files. Identify users of the old api version endpoints.
4. Update existing objects in the storage to new version by running cluster/update-storage-objects.sh
3. Turn off the old version.
### Turn on or off an API version for your cluster
Specific API versions can be turned on or off by passing --runtime-config=api/<version> flag while bringing up the server. For example: to turn off v1 API, pass --runtime-config=api/v1=false.
runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively. For example, for turning off all api versions except v1, pass --runtime-config=api/all=false,api/v1=true.
### Switching your cluster's storage API version
KUBE_API_VERSIONS env var controls the API versions that are supported in the cluster. The first version in the list is used as the cluster's storage version. Hence, to set a specific version as the storage version, bring it to the front of list of versions in the value of KUBE_API_VERSIONS.
### Switching your config files to a new API version
You can use the kube-version-change utility to convert config files between different API versions.
```
$ hack/build-go.sh cmd/kube-version-change
$ _output/local/go/bin/kube-version-change -i myPod.v1beta3.yaml -o myPod.v1.yaml
```
### Maintenance on a Node
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is
brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer,
then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding
replication controller, then a new copy of the pod will be started on a different node. So, in the case where all
pods are replicated, upgrades can be done without special coordination.
If you want more control over the upgrading process, you may use the following workflow:
1. Mark the node to be rebooted as unschedulable:
`kubectl update nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": true}}'`.
This keeps new pods from landing on the node while you are trying to get them off.
1. Get the pods off the machine, via any of the following strategies:
1. wait for finite-duration pods to complete
1. delete pods with `kubectl delete pods $PODNAME`
1. for pods with a replication controller, the pod will eventually be replaced by a new pod which will be scheduled to a new node. additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
1. for pods with no replication controller, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
1. Work on the node
1. Make the node schedulable again:
`kubectl update nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": false}}'`.
If you deleted the node's VM instance and created a new one, then a new schedulable node resource will
be created automatically when you create a new VM instance (if you're using a cloud provider that supports
node discovery; currently this is only GCE, not including CoreOS on GCE using kube-register). See [Node](node.md).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cluster_management.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/cluster_management.md?pixel)]()

View File

@ -0,0 +1,94 @@
# Kubernetes Container Environment
## Overview
This document describes the environment for Kubelet managed containers on a Kubernetes node (kNode).  In contrast to the Kubernetes cluster API, which provides an API for creating and managing containers, the Kubernetes container environment provides the container access to information about what else is going on in the cluster. 
This cluster information makes it possible to build applications that are *cluster aware*.  
Additionally, the Kubernetes container environment defines a series of hooks that are surfaced to optional hook handlers defined as part of individual containers.  Container hooks are somewhat analogous to operating system signals in a traditional process model.   However these hooks are designed to make it easier to build reliable, scalable cloud applications in the Kubernetes cluster.  Containers that participate in this cluster lifecycle become *cluster native*. 
Another important part of the container environment is the file system that is available to the container. In Kubernetes, the filesystem is a combination of an [image](./images.md) and one or more [volumes](./volumes.md).
The following sections describe both the cluster information provided to containers, as well as the hooks and life-cycle that allows containers to interact with the management system.
## Cluster Information
There are two types of information that are available within the container environment.  There is information about the container itself, and there is information about other objects in the system.
### Container Information
Currently, the only information about the container that is available to the container is the Pod name for the pod in which the container is running.  This ID is set as the hostname of the container, and is accessible through all calls to access the hostname within the container (e.g. the hostname command, or the [gethostname][1] function call in libc).  Additionally, user-defined environment variables from the pod definition, are also available to the container, as are any environment variables specified statically in the Docker image.
In the future, we anticipate expanding this information with richer information about the container.  Examples include available memory, number of restarts, and in general any state that you could get from the call to GET /pods on the API server.
### Cluster Information
Currently the list of all services that are running at the time when the container was created via the Kubernetes Cluster API are available to the container as environment variables.  The set of environment variables matches the syntax of Docker links.
For a service named **foo** that maps to a container port named **bar**, the following variables are defined:
```sh
FOO_SERVICE_HOST=<the host the service is running on>
FOO_SERVICE_PORT=<the port the service is running on>
```
Going forward, we expect that Services will have a dedicated IP address.  In that context, we will also surface services to the container via DNS.  Of course DNS is still not an enumerable protocol, so we will continue to provide environment variables so that containers can do discovery.
## Container Hooks
*NB*: Container hooks are under active development, we anticipate adding additional hooks as the Kubernetes container management system evolves.*
Container hooks provide information to the container about events in its management lifecycle.  For example, immediately after a container is started, it receives a *PostStart* hook.  These hooks are broadcast *into* the container with information about the life-cycle of the container.  They are different from the events provided by Docker and other systems which are *output* from the container.  Output events provide a log of what has already happened.  Input hooks provide real-time notification about things that are happening, but no historical log.  
### Hook Details
There are currently two container hooks that are surfaced to containers, and two proposed hooks:
*PreStart - ****Proposed***
This hook is sent immediately before a container is created.  It notifies that the container will be created immediately after the call completes.  No parameters are passed. *Note - *Some event handlers (namely exec are incompatible with this event)
*PostStart*
This hook is sent immediately after a container is created.  It notifies the container that it has been created.  No parameters are passed to the handler.
*PostRestart - ****Proposed***
This hook is called before the PostStart handler, when a container has been restarted, rather than started for the first time.  No parameters are passed to the handler.
*PreStop*
This hook is called immediately before a container is terminated.  This event handler is blocking, and must complete before the call to delete the container is sent to the Docker daemon. The SIGTERM notification sent by Docker is also still sent.
A single parameter named reason is passed to the handler which contains the reason for termination.  Currently the valid values for reason are:
* ```Delete``` - indicating an API call to delete the pod containing this container.
* ```Health``` - indicating that a health check of the container failed.
* ```Dependency``` - indicating that a dependency for the container or the pod is missing, and thus, the container needs to be restarted.  Examples include, the pod infra container crashing, or persistent disk failing for a container that mounts PD.
Eventually, user specified reasons may be [added to the API](https://github.com/GoogleCloudPlatform/kubernetes/issues/137).
### Hook Handler Execution
When a management hook occurs, the management system calls into any registered hook handlers in the container for that hook.  These hook handler calls are synchronous in the context of the pod containing the container. Note:this means that hook handler execution blocks any further management of the pod.  If your hook handler blocks, no other management (including health checks) will occur until the hook handler completes.  Blocking hook handlers do *not* affect management of other Pods.  Typically we expect that users will make their hook handlers as lightweight as possible, but there are cases where long running commands make sense (e.g. saving state prior to container stop)
For hooks which have parameters, these parameters are passed to the event handler as a set of key/value pairs.  The details of this parameter passing is handler implementation dependent (see below).
### Hook delivery guarantees
Hook delivery is "at least one", which means that a hook may be called multiple times for any given event (e.g. "start" or "stop") and it is up to the hook implementer to be able to handle this
correctly.
We expect double delivery to be rare, but in some cases if the ```kubelet``` restarts in the middle of sending a hook, the hook may be resent after the kubelet comes back up.
Likewise, we only make a single delivery attempt. If (for example) an http hook receiver is down, and unable to take traffic, we do not make any attempts to resend.
### Hook Handler Implementations
Hook handlers are the way that hooks are surfaced to containers.  Containers can select the type of hook handler they would like to implement.  Kubernetes currently supports two different hook handler types:
* Exec - Executes a specific command (e.g. pre-stop.sh) inside the cgroup and namespaces of the container.  Resources consumed by the command are counted against the container.  Commands which print "ok" to standard out (stdout) are treated as healthy, any other output is treated as container failures (and will cause kubelet to forcibly restart the container).  Parameters are passed to the command as traditional linux command line flags (e.g. pre-stop.sh --reason=HEALTH)
* HTTP - Executes an HTTP request against a specific endpoint on the container.  HTTP error codes (5xx) and non-response/failure to connect are treated as container failures. Parameters are passed to the http endpoint as query args (e.g. http://some.server.com/some/path?reason=HEALTH)
[1]: http://man7.org/linux/man-pages/man2/gethostname.2.html
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/container-environment.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/container-environment.md?pixel)]()

View File

@ -0,0 +1,95 @@
# Containers with Kubernetes
## Containers and commands
So far the Pods we've seen have all used the `image` field to indicate what process Kubernetes
should run in a container. In this case, Kubernetes runs the image's default command. If we want
to run a particular command or override the image's defaults, there are two additional fields that
we can use:
1. `Command`: Controls the actual command run by the image
2. `Args`: Controls the arguments passed to the command
### How docker handles command and arguments
Docker images have metadata associated with them that is used to store information about the image.
The image author may use this to define defaults for the command and arguments to run a container
when the user does not supply values. Docker calls the fields for commands and arguments
`Entrypoint` and `Cmd` respectively. The full details for this feature are too complicated to
describe here, mostly due to the fact that the docker API allows users to specify both of these
fields as either a string array or a string and there are subtle differences in how those cases are
handled. We encourage the curious to check out [docker's documentation]() for this feature.
Kubernetes allows you to override both the image's default command (docker `Entrypoint`) and args
(docker `Cmd`) with the `Command` and `Args` fields of `Container`. The rules are:
1. If you do not supply a `Command` or `Args` for a container, the defaults defined by the image
will be used
2. If you supply a `Command` but no `Args` for a container, only the supplied `Command` will be
used; the image's default arguments are ignored
3. If you supply only `Args`, the image's default command will be used with the arguments you
supply
4. If you supply a `Command` **and** `Args`, the image's defaults will be ignored and the values
you supply will be used
Here are examples for these rules in table format
| Image `Entrypoint` | Image `Cmd` | Container `Command` | Container `Args` | Command Run |
|--------------------|------------------|---------------------|--------------------|------------------|
| `[/ep-1]` | `[foo bar]` | &lt;not set&gt; | &lt;not set&gt; | `[ep-1 foo bar]` |
| `[/ep-1]` | `[foo bar]` | `[/ep-2]` | &lt;not set&gt; | `[ep-2]` |
| `[/ep-1]` | `[foo bar]` | &lt;not set&gt; | `[zoo boo]` | `[ep-1 zoo boo]` |
| `[/ep-1]` | `[foo bar]` | `[/ep-2]` | `[zoo boo]` | `[ep-2 zoo boo]` |
## Capabilities
By default, Docker containers are "unprivileged" and cannot, for example, run a Docker daemon inside a Docker container. We can have fine grain control over the capabilities using cap-add and cap-drop.More details [here](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration).
The relationship between Docker's capabilities and [Linux capabilities](http://man7.org/linux/man-pages/man7/capabilities.7.html)
| Docker's capabilities | Linux capabilities |
| ---- | ---- |
| SETPCAP | CAP_SETPCAP |
| SYS_MODULE | CAP_SYS_MODULE |
| SYS_RAWIO | CAP_SYS_RAWIO |
| SYS_PACCT | CAP_SYS_PACCT |
| SYS_ADMIN | CAP_SYS_ADMIN |
| SYS_NICE | CAP_SYS_NICE |
| SYS_RESOURCE | CAP_SYS_RESOURCE |
| SYS_TIME | CAP_SYS_TIME |
| SYS_TTY_CONFIG | CAP_SYS_TTY_CONFIG |
| MKNOD | CAP_MKNOD |
| AUDIT_WRITE | CAP_AUDIT_WRITE |
| AUDIT_CONTROL | CAP_AUDIT_CONTROL |
| MAC_OVERRIDE | CAP_MAC_OVERRIDE |
| MAC_ADMIN | CAP_MAC_ADMIN |
| NET_ADMIN | CAP_NET_ADMIN |
| SYSLOG | CAP_SYSLOG |
| CHOWN | CAP_CHOWN |
| NET_RAW | CAP_NET_RAW |
| DAC_OVERRIDE | CAP_DAC_OVERRIDE |
| FOWNER | CAP_FOWNER |
| DAC_READ_SEARCH | CAP_DAC_READ_SEARCH |
| FSETID | CAP_FSETID |
| KILL | CAP_KILL |
| SETGID | CAP_SETGID |
| SETUID | CAP_SETUID |
| LINUX_IMMUTABLE | CAP_LINUX_IMMUTABLE |
| NET_BIND_SERVICE | CAP_NET_BIND_SERVICE |
| NET_BROADCAST | CAP_NET_BROADCAST |
| IPC_LOCK | CAP_IPC_LOCK |
| IPC_OWNER | CAP_IPC_OWNER |
| SYS_CHROOT | CAP_SYS_CHROOT |
| SYS_PTRACE | CAP_SYS_PTRACE |
| SYS_BOOT | CAP_SYS_BOOT |
| LEASE | CAP_LEASE |
| SETFCAP | CAP_SETFCAP |
| WAKE_ALARM | CAP_WAKE_ALARM |
| BLOCK_SUSPEND | CAP_BLOCK_SUSPEND |
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/containers.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/containers.md?pixel)]()

View File

@ -0,0 +1,23 @@
# Kubernetes Design Overview
Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.
Kubernetes establishes robust declarative primitives for maintaining the desired state requested by the user. We see these primitives as the main value added by Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and replicating containers require active controllers, not just imperative orchestration.
Kubernetes is primarily targeted at applications composed of multiple containers, such as elastic, distributed micro-services. It is also designed to facilitate migration of non-containerized application stacks to Kubernetes. It therefore includes abstractions for grouping containers in both loosely coupled and tightly coupled formations, and provides ways for containers to find and communicate with each other in relatively familiar ways.
Kubernetes enables users to ask a cluster to run a set of containers. The system automatically chooses hosts to run those containers on. While Kubernetes's scheduler is currently very simple, we expect it to grow in sophistication over time. Scheduling is a policy-rich, topology-aware, workload-specific function that significantly impacts availability, performance, and capacity. The scheduler needs to take into account individual and collective resource requirements, quality of service requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, deadlines, and so on. Workload-specific requirements will be exposed through the API as necessary.
Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts.
A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../availability.md) and [cluster federation proposal](../proposals/federation.md) for more details).
Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner.
For more about the Kubernetes architecture, see [architecture](architecture.md).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/README.md?pixel)]()

View File

@ -0,0 +1,254 @@
# K8s Identity and Access Management Sketch
This document suggests a direction for identity and access management in the Kubernetes system.
## Background
High level goals are:
- Have a plan for how identity, authentication, and authorization will fit in to the API.
- Have a plan for partitioning resources within a cluster between independent organizational units.
- Ease integration with existing enterprise and hosted scenarios.
### Actors
Each of these can act as normal users or attackers.
- External Users: People who are accessing applications running on K8s (e.g. a web site served by webserver running in a container on K8s), but who do not have K8s API access.
- K8s Users : People who access the K8s API (e.g. create K8s API objects like Pods)
- K8s Project Admins: People who manage access for some K8s Users
- K8s Cluster Admins: People who control the machines, networks, or binaries that make up a K8s cluster.
- K8s Admin means K8s Cluster Admins and K8s Project Admins taken together.
### Threats
Both intentional attacks and accidental use of privilege are concerns.
For both cases it may be useful to think about these categories differently:
- Application Path - attack by sending network messages from the internet to the IP/port of any application running on K8s. May exploit weakness in application or misconfiguration of K8s.
- K8s API Path - attack by sending network messages to any K8s API endpoint.
- Insider Path - attack on K8s system components. Attacker may have privileged access to networks, machines or K8s software and data. Software errors in K8s system components and administrator error are some types of threat in this category.
This document is primarily concerned with K8s API paths, and secondarily with Internal paths. The Application path also needs to be secure, but is not the focus of this document.
### Assets to protect
External User assets:
- Personal information like private messages, or images uploaded by External Users
- web server logs
K8s User assets:
- External User assets of each K8s User
- things private to the K8s app, like:
- credentials for accessing other services (docker private repos, storage services, facebook, etc)
- SSL certificates for web servers
- proprietary data and code
K8s Cluster assets:
- Assets of each K8s User
- Machine Certificates or secrets.
- The value of K8s cluster computing resources (cpu, memory, etc).
This document is primarily about protecting K8s User assets and K8s cluster assets from other K8s Users and K8s Project and Cluster Admins.
### Usage environments
Cluster in Small organization:
- K8s Admins may be the same people as K8s Users.
- few K8s Admins.
- prefer ease of use to fine-grained access control/precise accounting, etc.
- Product requirement that it be easy for potential K8s Cluster Admin to try out setting up a simple cluster.
Cluster in Large organization:
- K8s Admins typically distinct people from K8s Users. May need to divide K8s Cluster Admin access by roles.
- K8s Users need to be protected from each other.
- Auditing of K8s User and K8s Admin actions important.
- flexible accurate usage accounting and resource controls important.
- Lots of automated access to APIs.
- Need to integrate with existing enterprise directory, authentication, accounting, auditing, and security policy infrastructure.
Org-run cluster:
- organization that runs K8s master components is same as the org that runs apps on K8s.
- Nodes may be on-premises VMs or physical machines; Cloud VMs; or a mix.
Hosted cluster:
- Offering K8s API as a service, or offering a Paas or Saas built on K8s
- May already offer web services, and need to integrate with existing customer account concept, and existing authentication, accounting, auditing, and security policy infrastructure.
- May want to leverage K8s User accounts and accounting to manage their User accounts (not a priority to support this use case.)
- Precise and accurate accounting of resources needed. Resource controls needed for hard limits (Users given limited slice of data) and soft limits (Users can grow up to some limit and then be expanded).
K8s ecosystem services:
- There may be companies that want to offer their existing services (Build, CI, A/B-test, release automation, etc) for use with K8s. There should be some story for this case.
Pods configs should be largely portable between Org-run and hosted configurations.
# Design
Related discussion:
- https://github.com/GoogleCloudPlatform/kubernetes/issues/442
- https://github.com/GoogleCloudPlatform/kubernetes/issues/443
This doc describes two security profiles:
- Simple profile: like single-user mode. Make it easy to evaluate K8s without lots of configuring accounts and policies. Protects from unauthorized users, but does not partition authorized users.
- Enterprise profile: Provide mechanisms needed for large numbers of users. Defense in depth. Should integrate with existing enterprise security infrastructure.
K8s distribution should include templates of config, and documentation, for simple and enterprise profiles. System should be flexible enough for knowledgeable users to create intermediate profiles, but K8s developers should only reason about those two Profiles, not a matrix.
Features in this doc are divided into "Initial Feature", and "Improvements". Initial features would be candidates for version 1.00.
## Identity
###userAccount
K8s will have a `userAccount` API object.
- `userAccount` has a UID which is immutable. This is used to associate users with objects and to record actions in audit logs.
- `userAccount` has a name which is a string and human readable and unique among userAccounts. It is used to refer to users in Policies, to ensure that the Policies are human readable. It can be changed only when there are no Policy objects or other objects which refer to that name. An email address is a suggested format for this field.
- `userAccount` is not related to the unix username of processes in Pods created by that userAccount.
- `userAccount` API objects can have labels
The system may associate one or more Authentication Methods with a
`userAccount` (but they are not formally part of the userAccount object.)
In a simple deployment, the authentication method for a
user might be an authentication token which is verified by a K8s server. In a
more complex deployment, the authentication might be delegated to
another system which is trusted by the K8s API to authenticate users, but where
the authentication details are unknown to K8s.
Initial Features:
- there is no superuser `userAccount`
- `userAccount` objects are statically populated in the K8s API store by reading a config file. Only a K8s Cluster Admin can do this.
- `userAccount` can have a default `namespace`. If API call does not specify a `namespace`, the default `namespace` for that caller is assumed.
- `userAccount` is global. A single human with access to multiple namespaces is recommended to only have one userAccount.
Improvements:
- Make `userAccount` part of a separate API group from core K8s objects like `pod`. Facilitates plugging in alternate Access Management.
Simple Profile:
- single `userAccount`, used by all K8s Users and Project Admins. One access token shared by all.
Enterprise Profile:
- every human user has own `userAccount`.
- `userAccount`s have labels that indicate both membership in groups, and ability to act in certain roles.
- each service using the API has own `userAccount` too. (e.g. `scheduler`, `repcontroller`)
- automated jobs to denormalize the ldap group info into the local system list of users into the K8s userAccount file.
###Unix accounts
A `userAccount` is not a Unix user account. The fact that a pod is started by a `userAccount` does not mean that the processes in that pod's containers run as a Unix user with a corresponding name or identity.
Initially:
- The unix accounts available in a container, and used by the processes running in a container are those that are provided by the combination of the base operating system and the Docker manifest.
- Kubernetes doesn't enforce any relation between `userAccount` and unix accounts.
Improvements:
- Kubelet allocates disjoint blocks of root-namespace uids for each container. This may provide some defense-in-depth against container escapes. (https://github.com/docker/docker/pull/4572)
- requires docker to integrate user namespace support, and deciding what getpwnam() does for these uids.
- any features that help users avoid use of privileged containers (https://github.com/GoogleCloudPlatform/kubernetes/issues/391)
###Namespaces
K8s will have a have a `namespace` API object. It is similar to a Google Compute Engine `project`. It provides a namespace for objects created by a group of people co-operating together, preventing name collisions with non-cooperating groups. It also serves as a reference point for authorization policies.
Namespaces are described in [namespace.md](namespaces.md).
In the Enterprise Profile:
- a `userAccount` may have permission to access several `namespace`s.
In the Simple Profile:
- There is a single `namespace` used by the single user.
Namespaces versus userAccount vs Labels:
- `userAccount`s are intended for audit logging (both name and UID should be logged), and to define who has access to `namespace`s.
- `labels` (see [docs/labels.md](/docs/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities.
- `namespace`s prevent name collisions between uncoordinated groups of people, and provide a place to attach common policies for co-operating groups of people.
## Authentication
Goals for K8s authentication:
- Include a built-in authentication system with no configuration required to use in single-user mode, and little configuration required to add several user accounts, and no https proxy required.
- Allow for authentication to be handled by a system external to Kubernetes, to allow integration with existing to enterprise authorization systems. The kubernetes namespace itself should avoid taking contributions of multiple authorization schemes. Instead, a trusted proxy in front of the apiserver can be used to authenticate users.
- For organizations whose security requirements only allow FIPS compliant implementations (e.g. apache) for authentication.
- So the proxy can terminate SSL, and isolate the CA-signed certificate from less trusted, higher-touch APIserver.
- For organizations that already have existing SaaS web services (e.g. storage, VMs) and want a common authentication portal.
- Avoid mixing authentication and authorization, so that authorization policies be centrally managed, and to allow changes in authentication methods without affecting authorization code.
Initially:
- Tokens used to authenticate a user.
- Long lived tokens identify a particular `userAccount`.
- Administrator utility generates tokens at cluster setup.
- OAuth2.0 Bearer tokens protocol, http://tools.ietf.org/html/rfc6750
- No scopes for tokens. Authorization happens in the API server
- Tokens dynamically generated by apiserver to identify pods which are making API calls.
- Tokens checked in a module of the APIserver.
- Authentication in apiserver can be disabled by flag, to allow testing without authorization enabled, and to allow use of an authenticating proxy. In this mode, a query parameter or header added by the proxy will identify the caller.
Improvements:
- Refresh of tokens.
- SSH keys to access inside containers.
To be considered for subsequent versions:
- Fuller use of OAuth (http://tools.ietf.org/html/rfc6749)
- Scoped tokens.
- Tokens that are bound to the channel between the client and the api server
- http://www.ietf.org/proceedings/90/slides/slides-90-uta-0.pdf
- http://www.browserauth.net
## Authorization
K8s authorization should:
- Allow for a range of maturity levels, from single-user for those test driving the system, to integration with existing to enterprise authorization systems.
- Allow for centralized management of users and policies. In some organizations, this will mean that the definition of users and access policies needs to reside on a system other than k8s and encompass other web services (such as a storage service).
- Allow processes running in K8s Pods to take on identity, and to allow narrow scoping of permissions for those identities in order to limit damage from software faults.
- Have Authorization Policies exposed as API objects so that a single config file can create or delete Pods, Controllers, Services, and the identities and policies for those Pods and Controllers.
- Be separate as much as practical from Authentication, to allow Authentication methods to change over time and space, without impacting Authorization policies.
K8s will implement a relatively simple
[Attribute-Based Access Control](http://en.wikipedia.org/wiki/Attribute_Based_Access_Control) model.
The model will be described in more detail in a forthcoming document. The model will
- Be less complex than XACML
- Be easily recognizable to those familiar with Amazon IAM Policies.
- Have a subset/aliases/defaults which allow it to be used in a way comfortable to those users more familiar with Role-Based Access Control.
Authorization policy is set by creating a set of Policy objects.
The API Server will be the Enforcement Point for Policy. For each API call that it receives, it will construct the Attributes needed to evaluate the policy (what user is making the call, what resource they are accessing, what they are trying to do that resource, etc) and pass those attributes to a Decision Point. The Decision Point code evaluates the Attributes against all the Policies and allows or denies the API call. The system will be modular enough that the Decision Point code can either be linked into the APIserver binary, or be another service that the apiserver calls for each Decision (with appropriate time-limited caching as needed for performance).
Policy objects may be applicable only to a single namespace or to all namespaces; K8s Project Admins would be able to create those as needed. Other Policy objects may be applicable to all namespaces; a K8s Cluster Admin might create those in order to authorize a new type of controller to be used by all namespaces, or to make a K8s User into a K8s Project Admin.)
## Accounting
The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](/docs/resources.md)).
Initially:
- a `quota` object is immutable.
- for hosted K8s systems that do billing, Project is recommended level for billing accounts.
- Every object that consumes resources should have a `namespace` so that Resource usage stats are roll-up-able to `namespace`.
- K8s Cluster Admin sets quota objects by writing a config file.
Improvements:
- allow one namespace to charge the quota for one or more other namespaces. This would be controlled by a policy which allows changing a billing_namespace= label on an object.
- allow quota to be set by namespace owners for (namespace x label) combinations (e.g. let "webserver" namespace use 100 cores, but to prevent accidents, don't allow "webserver" namespace and "instance=test" use more than 10 cores.
- tools to help write consistent quota config files based on number of nodes, historical namespace usages, QoS needs, etc.
- way for K8s Cluster Admin to incrementally adjust Quota objects.
Simple profile:
- a single `namespace` with infinite resource limits.
Enterprise profile:
- multiple namespaces each with their own limits.
Issues:
- need for locking or "eventual consistency" when multiple apiserver goroutines are accessing the object store and handling pod creations.
## Audit Logging
API actions can be logged.
Initial implementation:
- All API calls logged to nginx logs.
Improvements:
- API server does logging instead.
- Policies to drop logging for high rate trusted API calls, or by users performing audit or other sensitive functions.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/access.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/access.md?pixel)]()

View File

@ -0,0 +1,85 @@
# Kubernetes Proposal - Admission Control
**Related PR:**
| Topic | Link |
| ----- | ---- |
| Separate validation from RESTStorage | https://github.com/GoogleCloudPlatform/kubernetes/issues/2977 |
## Background
High level goals:
* Enable an easy-to-use mechanism to provide admission control to cluster
* Enable a provider to support multiple admission control strategies or author their own
* Ensure any rejected request can propagate errors back to the caller with why the request failed
Authorization via policy is focused on answering if a user is authorized to perform an action.
Admission Control is focused on if the system will accept an authorized action.
Kubernetes may choose to dismiss an authorized action based on any number of admission control strategies.
This proposal documents the basic design, and describes how any number of admission control plug-ins could be injected.
Implementation of specific admission control strategies are handled in separate documents.
## kube-apiserver
The kube-apiserver takes the following OPTIONAL arguments to enable admission control
| Option | Behavior |
| ------ | -------- |
| admission_control | Comma-delimited, ordered list of admission control choices to invoke prior to modifying or deleting an object. |
| admission_control_config_file | File with admission control configuration parameters to boot-strap plug-in. |
An **AdmissionControl** plug-in is an implementation of the following interface:
```go
package admission
// Attributes is an interface used by a plug-in to make an admission decision on a individual request.
type Attributes interface {
GetNamespace() string
GetKind() string
GetOperation() string
GetObject() runtime.Object
}
// Interface is an abstract, pluggable interface for Admission Control decisions.
type Interface interface {
// Admit makes an admission decision based on the request attributes
// An error is returned if it denies the request.
Admit(a Attributes) (err error)
}
```
A **plug-in** must be compiled with the binary, and is registered as an available option by providing a name, and implementation
of admission.Interface.
```go
func init() {
admission.RegisterPlugin("AlwaysDeny", func(client client.Interface, config io.Reader) (admission.Interface, error) { return NewAlwaysDeny(), nil })
}
```
Invocation of admission control is handled by the **APIServer** and not individual **RESTStorage** implementations.
This design assumes that **Issue 297** is adopted, and as a consequence, the general framework of the APIServer request/response flow
will ensure the following:
1. Incoming request
2. Authenticate user
3. Authorize user
4. If operation=create|update, then validate(object)
5. If operation=create|update|delete, then admission.Admit(requestAttributes)
a. invoke each admission.Interface object in sequence
6. Object is persisted
If at any step, there is an error, the request is canceled.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/admission_control.md?pixel)]()

View File

@ -0,0 +1,138 @@
# Admission control plugin: LimitRanger
## Background
This document proposes a system for enforcing min/max limits per resource as part of admission control.
## Model Changes
A new resource, **LimitRange**, is introduced to enumerate min/max limits for a resource type scoped to a
Kubernetes namespace.
```go
const (
// Limit that applies to all pods in a namespace
LimitTypePod string = "Pod"
// Limit that applies to all containers in a namespace
LimitTypeContainer string = "Container"
)
// LimitRangeItem defines a min/max usage limit for any resource that matches on kind
type LimitRangeItem struct {
// Type of resource that this limit applies to
Type string `json:"type,omitempty"`
// Max usage constraints on this kind by resource name
Max ResourceList `json:"max,omitempty"`
// Min usage constraints on this kind by resource name
Min ResourceList `json:"min,omitempty"`
// Default usage constraints on this kind by resource name
Default ResourceList `json:"default,omitempty"`
}
// LimitRangeSpec defines a min/max usage limit for resources that match on kind
type LimitRangeSpec struct {
// Limits is the list of LimitRangeItem objects that are enforced
Limits []LimitRangeItem `json:"limits"`
}
// LimitRange sets resource usage limits for each kind of resource in a Namespace
type LimitRange struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
// Spec defines the limits enforced
Spec LimitRangeSpec `json:"spec,omitempty"`
}
// LimitRangeList is a list of LimitRange items.
type LimitRangeList struct {
TypeMeta `json:",inline"`
ListMeta `json:"metadata,omitempty"`
// Items is a list of LimitRange objects
Items []LimitRange `json:"items"`
}
```
## AdmissionControl plugin: LimitRanger
The **LimitRanger** plug-in introspects all incoming admission requests.
It makes decisions by evaluating the incoming object against all defined **LimitRange** objects in the request context namespace.
The following min/max limits are imposed:
**Type: Container**
| ResourceName | Description |
| ------------ | ----------- |
| cpu | Min/Max amount of cpu per container |
| memory | Min/Max amount of memory per container |
**Type: Pod**
| ResourceName | Description |
| ------------ | ----------- |
| cpu | Min/Max amount of cpu per pod |
| memory | Min/Max amount of memory per pod |
If a resource specifies a default value, it may get applied on the incoming resource. For example, if a default
value is provided for container cpu, it is set on the incoming container if and only if the incoming container
does not specify a resource requirements limit field.
If a resource specifies a min value, it may get applied on the incoming resource. For example, if a min
value is provided for container cpu, it is set on the incoming container if and only if the incoming container does
not specify a resource requirements requests field.
If the incoming object would cause a violation of the enumerated constraints, the request is denied with a set of
messages explaining what constraints were the source of the denial.
If a constraint is not enumerated by a **LimitRange** it is not tracked.
## kube-apiserver
The server is updated to be aware of **LimitRange** objects.
The constraints are only enforced if the kube-apiserver is started as follows:
```
$ kube-apiserver -admission_control=LimitRanger
```
## kubectl
kubectl is modified to support the **LimitRange** resource.
```kubectl describe``` provides a human-readable output of limits.
For example,
```shell
$ kubectl namespace myspace
$ kubectl create -f examples/limitrange/limit-range.json
$ kubectl get limits
NAME
limits
$ kubectl describe limits limits
Name: limits
Type Resource Min Max Default
---- -------- --- --- ---
Pod memory 1Mi 1Gi -
Pod cpu 250m 2 -
Container memory 1Mi 1Gi 1Mi
Container cpu 250m 250m 250m
```
## Future Enhancements: Define limits for a particular pod or container.
In the current proposal, the **LimitRangeItem** matches purely on **LimitRangeItem.Type**
It is expected we will want to define limits for particular pods or containers by name/uid and label/field selector.
To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_limit_range.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/admission_control_limit_range.md?pixel)]()

View File

@ -0,0 +1,159 @@
# Admission control plugin: ResourceQuota
## Background
This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control.
## Model Changes
A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace.
A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status.
```go
// The following identify resource constants for Kubernetes object types
const (
// Pods, number
ResourcePods ResourceName = "pods"
// Services, number
ResourceServices ResourceName = "services"
// ReplicationControllers, number
ResourceReplicationControllers ResourceName = "replicationcontrollers"
// ResourceQuotas, number
ResourceQuotas ResourceName = "resourcequotas"
)
// ResourceQuotaSpec defines the desired hard limits to enforce for Quota
type ResourceQuotaSpec struct {
// Hard is the set of desired hard limits for each named resource
Hard ResourceList `json:"hard,omitempty"`
}
// ResourceQuotaStatus defines the enforced hard limits and observed use
type ResourceQuotaStatus struct {
// Hard is the set of enforced hard limits for each named resource
Hard ResourceList `json:"hard,omitempty"`
// Used is the current observed total usage of the resource in the namespace
Used ResourceList `json:"used,omitempty"`
}
// ResourceQuota sets aggregate quota restrictions enforced per namespace
type ResourceQuota struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
// Spec defines the desired quota
Spec ResourceQuotaSpec `json:"spec,omitempty"`
// Status defines the actual enforced quota and its current usage
Status ResourceQuotaStatus `json:"status,omitempty"`
}
// ResourceQuotaUsage captures system observed quota status per namespace
// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage
type ResourceQuotaUsage struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
// Status defines the actual enforced quota and its current usage
Status ResourceQuotaStatus `json:"status,omitempty"`
}
// ResourceQuotaList is a list of ResourceQuota items
type ResourceQuotaList struct {
TypeMeta `json:",inline"`
ListMeta `json:"metadata,omitempty"`
// Items is a list of ResourceQuota objects
Items []ResourceQuota `json:"items"`
}
```
## AdmissionControl plugin: ResourceQuota
The **ResourceQuota** plug-in introspects all incoming admission requests.
It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
The following resource limits are imposed as part of core Kubernetes at the namespace level:
| ResourceName | Description |
| ------------ | ----------- |
| cpu | Total cpu usage |
| memory | Total memory usage |
| pods | Total number of pods |
| services | Total number of services |
| replicationcontrollers | Total number of replication controllers |
| resourcequotas | Total number of resource quotas |
Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read
**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
into the system.
To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result,
its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly
capping it in **ResourceQuota** document.
## kube-apiserver
The server is updated to be aware of **ResourceQuota** objects.
The quota is only enforced if the kube-apiserver is started as follows:
```
$ kube-apiserver -admission_control=ResourceQuota
```
## kube-controller-manager
A new controller is defined that runs a synch loop to calculate quota usage across the namespace.
**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object.
If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource
to the server to atomically update.
The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down.
To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate
usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate
this being the resource most closely running at the prescribed quota limits.
## kubectl
kubectl is modified to support the **ResourceQuota** resource.
```kubectl describe``` provides a human-readable output of quota.
For example,
```
$ kubectl namespace myspace
$ kubectl create -f examples/resourcequota/resource-quota.json
$ kubectl get quota
NAME
quota
$ kubectl describe quota quota
Name: quota
Resource Used Hard
-------- ---- ----
cpu 0m 20
memory 0 1Gi
pods 5 10
replicationcontrollers 5 20
resourcequotas 1 1
services 3 5
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_resource_quota.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/admission_control_resource_quota.md?pixel)]()

View File

@ -0,0 +1,50 @@
# Kubernetes architecture
A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable.
![Architecture Diagram](../architecture.png?raw=true "Architecture overview")
## The Kubernetes Node
When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that compose the cluster-level control plane.
The Kubernetes node has the services necessary to run application containers and be managed from the master systems.
Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers.
### Kubelet
The **Kubelet** manages [pods](../pods.md) and their containers, their images, their volumes, etc.
### Kube-Proxy
Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends.
Service endpoints are currently found via [DNS](../dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy.
## The Kubernetes Control Plane
The Kubernetes control plane is split into a set of components. Currently they all run on a single _master_ node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster.
### etcd
All persistent master state is stored in an instance of `etcd`. This provides a great way to store configuration data reliably. With `watch` support, coordinating components can be notified very quickly of changes.
### Kubernetes API Server
The apiserver serves up the [Kubernetes API](../api.md). It is intended to be a CRUD-y server, with most/all business logic implemented in separate components or in plug-ins. It mainly processes REST operations, validates them, and updates the corresponding objects in `etcd` (and eventually other stores).
### Scheduler
The scheduler binds unscheduled pods to nodes via the `/binding` API. The scheduler is pluggable, and we expect to support multiple cluster schedulers and even user-provided schedulers in the future.
### Kubernetes Controller Manager Server
All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable.
The [`replicationcontroller`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/architecture.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/architecture.md?pixel)]()

View File

@ -0,0 +1,66 @@
# Clustering in Kubernetes
## Overview
The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address.
Once a cluster is established, the following is true:
1. **Master -> Node** The master needs to know which nodes can take work and what their current status is wrt capacity.
1. **Location** The master knows the name and location of all of the nodes in the cluster.
* For the purposes of this doc, location and name should be enough information so that the master can open a TCP connection to the Node. Most probably we will make this either an IP address or a DNS name. It is going to be important to be consistent here (master must be able to reach kubelet on that DNS name) so that we can verify certificates appropriately.
2. **Target AuthN** A way to securely talk to the kubelet on that node. Currently we call out to the kubelet over HTTP. This should be over HTTPS and the master should know what CA to trust for that node.
3. **Caller AuthN/Z** This would be the master verifying itself (and permissions) when calling the node. Currently, this is only used to collect statistics as authorization isn't critical. This may change in the future though.
2. **Node -> Master** The nodes currently talk to the master to know which pods have been assigned to them and to publish events.
1. **Location** The nodes must know where the master is at.
2. **Target AuthN** Since the master is assigning work to the nodes, it is critical that they verify whom they are talking to.
3. **Caller AuthN/Z** The nodes publish events and so must be authenticated to the master. Ideally this authentication is specific to each node so that authorization can be narrowly scoped. The details of the work to run (including things like environment variables) might be considered sensitive and should be locked down also.
**Note:** While the description here refers to a singular Master, in the future we should enable multiple Masters operating in an HA mode. While the "Master" is currently the combination of the API Server, Scheduler and Controller Manager, we will restrict ourselves to thinking about the main API and policy engine -- the API Server.
## Current Implementation
A central authority (generally the master) is responsible for determining the set of machines which are members of the cluster. Calls to create and remove worker nodes in the cluster are restricted to this single authority, and any other requests to add or remove worker nodes are rejected. (1.i).
Communication from the master to nodes is currently over HTTP and is not secured or authenticated in any way. (1.ii, 1.iii).
The location of the master is communicated out of band to the nodes. For GCE, this is done via Salt. Other cluster instructions/scripts use other methods. (2.i)
Currently most communication from the node to the master is over HTTP. When it is done over HTTPS there is currently no verification of the cert of the master (2.ii).
Currently, the node/kubelet is authenticated to the master via a token shared across all nodes. This token is distributed out of band (using Salt for GCE) and is optional. If it is not present then the kubelet is unable to publish events to the master. (2.iii)
Our current mix of out of band communication doesn't meet all of our needs from a security point of view and is difficult to set up and configure.
## Proposed Solution
The proposed solution will provide a range of options for setting up and maintaining a secure Kubernetes cluster. We want to both allow for centrally controlled systems (leveraging pre-existing trust and configuration systems) or more ad-hoc automagic systems that are incredibly easy to set up.
The building blocks of an easier solution:
* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly idenitfy the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN.
* [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate.
* **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors.
* **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself.
* To start with, we'd have the kubelets generate a cert/account in the form of `kubelet:<host>`. To start we would then hard code policy such that we give that particular account appropriate permissions. Over time, we can make the policy engine more generic.
* [optional] **Bootstrap API endpoint** This is a helper service hosted outside of the Kubernetes cluster that helps with initial discovery of the master.
### Static Clustering
In this sequence diagram there is out of band admin entity that is creating all certificates and distributing them. It is also making sure that the kubelets know where to find the master. This provides for a lot of control but is more difficult to set up as lots of information must be communicated outside of Kubernetes.
![Static Sequence Diagram](clustering/static.png)
### Dynamic Clustering
This diagram dynamic clustering using the bootstrap API endpoint. That API endpoint is used to both find the location of the master and communicate the root CA for the master.
This flow has the admin manually approving the kubelet signing requests. This is the `queue` policy defined above.This manual intervention could be replaced by code that can verify the signing requests via other means.
![Dynamic Sequence Diagram](clustering/dynamic.png)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/clustering.md?pixel)]()

View File

@ -0,0 +1 @@
DroidSansMono.ttf

View File

@ -0,0 +1,12 @@
FROM debian:jessie
RUN apt-get update
RUN apt-get -qy install python-seqdiag make curl
WORKDIR /diagrams
RUN curl -sLo DroidSansMono.ttf https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/DroidSansMono.ttf
ADD . /diagrams
CMD bash -c 'make >/dev/stderr && tar cf - *.png'

View File

@ -0,0 +1,29 @@
FONT := DroidSansMono.ttf
PNGS := $(patsubst %.seqdiag,%.png,$(wildcard *.seqdiag))
.PHONY: all
all: $(PNGS)
.PHONY: watch
watch:
fswatch *.seqdiag | xargs -n 1 sh -c "make || true"
$(FONT):
curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/$(FONT)
%.png: %.seqdiag $(FONT)
seqdiag --no-transparency -a -f '$(FONT)' $<
# Build the stuff via a docker image
.PHONY: docker
docker:
docker build -t clustering-seqdiag .
docker run --rm clustering-seqdiag | tar xvf -
docker-clean:
docker rmi clustering-seqdiag || true
docker images -q --filter "dangling=true" | xargs docker rmi
fix-clock-skew:
boot2docker ssh sudo date -u -D "%Y%m%d%H%M.%S" --set "$(shell date -u +%Y%m%d%H%M.%S)"

View File

@ -0,0 +1,31 @@
This directory contains diagrams for the clustering design doc.
This depends on the `seqdiag` [utility](http://blockdiag.com/en/seqdiag/index.html). Assuming you have a non-borked python install, this should be installable with
```bash
pip install seqdiag
```
Just call `make` to regenerate the diagrams.
## Building with Docker
If you are on a Mac or your pip install is messed up, you can easily build with docker.
```
make docker
```
The first run will be slow but things should be fast after that.
To clean up the docker containers that are created (and other cruft that is left around) you can run `make docker-clean`.
If you are using boot2docker and get warnings about clock skew (or if things aren't building for some reason) then you can fix that up with `make fix-clock-skew`.
## Automatically rebuild on file changes
If you have the fswatch utility installed, you can have it monitor the file system and automatically rebuild when files have changed. Just do a `make watch`.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/clustering/README.md?pixel)]()

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

View File

@ -0,0 +1,24 @@
seqdiag {
activation = none;
user[label = "Admin User"];
bootstrap[label = "Bootstrap API\nEndpoint"];
master;
kubelet[stacked];
user -> bootstrap [label="createCluster", return="cluster ID"];
user <-- bootstrap [label="returns\n- bootstrap-cluster-uri"];
user ->> master [label="start\n- bootstrap-cluster-uri"];
master => bootstrap [label="setMaster\n- master-location\n- master-ca"];
user ->> kubelet [label="start\n- bootstrap-cluster-uri"];
kubelet => bootstrap [label="get-master", return="returns\n- master-location\n- master-ca"];
kubelet ->> master [label="signCert\n- unsigned-kubelet-cert", return="retuns\n- kubelet-cert"];
user => master [label="getSignRequests"];
user => master [label="approveSignRequests"];
kubelet <<-- master [label="returns\n- kubelet-cert"];
kubelet => master [label="register\n- kubelet-location"]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

View File

@ -0,0 +1,16 @@
seqdiag {
activation = none;
admin[label = "Manual Admin"];
ca[label = "Manual CA"]
master;
kubelet[stacked];
admin => ca [label="create\n- master-cert"];
admin ->> master [label="start\n- ca-root\n- master-cert"];
admin => ca [label="create\n- kubelet-cert"];
admin ->> kubelet [label="start\n- ca-root\n- kubelet-cert\n- master-location"];
kubelet => master [label="register\n- kubelet-location"];
}

View File

@ -0,0 +1,149 @@
# Container Command Execution & Port Forwarding in Kubernetes
## Abstract
This describes an approach for providing support for:
- executing commands in containers, with stdin/stdout/stderr streams attached
- port forwarding to containers
## Background
There are several related issues/PRs:
- [Support attach](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521)
- [Real container ssh](https://github.com/GoogleCloudPlatform/kubernetes/issues/1513)
- [Provide easy debug network access to services](https://github.com/GoogleCloudPlatform/kubernetes/issues/1863)
- [OpenShift container command execution proposal](https://github.com/openshift/origin/pull/576)
## Motivation
Users and administrators are accustomed to being able to access their systems
via SSH to run remote commands, get shell access, and do port forwarding.
Supporting SSH to containers in Kubernetes is a difficult task. You must
specify a "user" and a hostname to make an SSH connection, and `sshd` requires
real users (resolvable by NSS and PAM). Because a container belongs to a pod,
and the pod belongs to a namespace, you need to specify namespace/pod/container
to uniquely identify the target container. Unfortunately, a
namespace/pod/container is not a real user as far as SSH is concerned. Also,
most Linux systems limit user names to 32 characters, which is unlikely to be
large enough to contain namespace/pod/container. We could devise some scheme to
map each namespace/pod/container to a 32-character user name, adding entries to
`/etc/passwd` (or LDAP, etc.) and keeping those entries fully in sync all the
time. Alternatively, we could write custom NSS and PAM modules that allow the
host to resolve a namespace/pod/container to a user without needing to keep
files or LDAP in sync.
As an alternative to SSH, we are using a multiplexed streaming protocol that
runs on top of HTTP. There are no requirements about users being real users,
nor is there any limitation on user name length, as the protocol is under our
control. The only downside is that standard tooling that expects to use SSH
won't be able to work with this mechanism, unless adapters can be written.
## Constraints and Assumptions
- SSH support is not currently in scope
- CGroup confinement is ultimately desired, but implementing that support is not currently in scope
- SELinux confinement is ultimately desired, but implementing that support is not currently in scope
## Use Cases
- As a user of a Kubernetes cluster, I want to run arbitrary commands in a container, attaching my local stdin/stdout/stderr to the container
- As a user of a Kubernetes cluster, I want to be able to connect to local ports on my computer and have them forwarded to ports in the container
## Process Flow
### Remote Command Execution Flow
1. The client connects to the Kubernetes Master to initiate a remote command execution
request
2. The Master proxies the request to the Kubelet where the container lives
3. The Kubelet executes nsenter + the requested command and streams stdin/stdout/stderr back and forth between the client and the container
### Port Forwarding Flow
1. The client connects to the Kubernetes Master to initiate a remote command execution
request
2. The Master proxies the request to the Kubelet where the container lives
3. The client listens on each specified local port, awaiting local connections
4. The client connects to one of the local listening ports
4. The client notifies the Kubelet of the new connection
5. The Kubelet executes nsenter + socat and streams data back and forth between the client and the port in the container
## Design Considerations
### Streaming Protocol
The current multiplexed streaming protocol used is SPDY. This is not the
long-term desire, however. As soon as there is viable support for HTTP/2 in Go,
we will switch to that.
### Master as First Level Proxy
Clients should not be allowed to communicate directly with the Kubelet for
security reasons. Therefore, the Master is currently the only suggested entry
point to be used for remote command execution and port forwarding. This is not
necessarily desirable, as it means that all remote command execution and port
forwarding traffic must travel through the Master, potentially impacting other
API requests.
In the future, it might make more sense to retrieve an authorization token from
the Master, and then use that token to initiate a remote command execution or
port forwarding request with a load balanced proxy service dedicated to this
functionality. This would keep the streaming traffic out of the Master.
### Kubelet as Backend Proxy
The kubelet is currently responsible for handling remote command execution and
port forwarding requests. Just like with the Master described above, this means
that all remote command execution and port forwarding streaming traffic must
travel through the Kubelet, which could result in a degraded ability to service
other requests.
In the future, it might make more sense to use a separate service on the node.
Alternatively, we could possibly inject a process into the container that only
listens for a single request, expose that process's listening port on the node,
and then issue a redirect to the client such that it would connect to the first
level proxy, which would then proxy directly to the injected process's exposed
port. This would minimize the amount of proxying that takes place.
### Scalability
There are at least 2 different ways to execute a command in a container:
`docker exec` and `nsenter`. While `docker exec` might seem like an easier and
more obvious choice, it has some drawbacks.
#### `docker exec`
We could expose `docker exec` (i.e. have Docker listen on an exposed TCP port
on the node), but this would require proxying from the edge and securing the
Docker API. `docker exec` calls go through the Docker daemon, meaning that all
stdin/stdout/stderr traffic is proxied through the Daemon, adding an extra hop.
Additionally, you can't isolate 1 malicious `docker exec` call from normal
usage, meaning an attacker could initiate a denial of service or other attack
and take down the Docker daemon, or the node itself.
We expect remote command execution and port forwarding requests to be long
running and/or high bandwidth operations, and routing all the streaming data
through the Docker daemon feels like a bottleneck we can avoid.
#### `nsenter`
The implementation currently uses `nsenter` to run commands in containers,
joining the appropriate container namespaces. `nsenter` runs directly on the
node and is not proxied through any single daemon process.
### Security
Authentication and authorization hasn't specifically been tested yet with this
functionality. We need to make sure that users are not allowed to execute
remote commands or do port forwarding to containers they aren't allowed to
access.
Additional work is required to ensure that multiple command execution or port forwarding connections from different clients are not able to see each other's data. This can most likely be achieved via SELinux labeling and unique process contexts.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/command_execution_port_forwarding.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/command_execution_port_forwarding.md?pixel)]()

View File

@ -0,0 +1,84 @@
# Kubernetes Event Compression
This document captures the design of event compression.
## Background
Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)).
## Proposal
Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event.
Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries.
## Design
Instead of a single Timestamp, each event object [contains](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L1111) the following fields:
* ```FirstTimestamp util.Time```
* The date/time of the first occurrence of the event.
* ```LastTimestamp util.Time```
* The date/time of the most recent occurrence of the event.
* On first occurrence, this is equal to the FirstTimestamp.
* ```Count int```
* The number of occurrences of this event between FirstTimestamp and LastTimestamp
* On first occurrence, this is 1.
Each binary that generates events:
* Maintains a historical record of previously generated events:
* Implmented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client/record/events_cache.go).
* The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event:
* ```event.Source.Component```
* ```event.Source.Host```
* ```event.InvolvedObject.Kind```
* ```event.InvolvedObject.Namespace```
* ```event.InvolvedObject.Name```
* ```event.InvolvedObject.UID```
* ```event.InvolvedObject.APIVersion```
* ```event.Reason```
* ```event.Message```
* The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache.
* When an event is generated, the previously generated events cache is checked (see [```pkg/client/record/event.go```](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/client/record/event.go)).
* If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd:
* The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count.
* The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update).
* If the key for the new event does not match the key for any previously generated event (meaning none of the above fields match between the new event and any previously generated events), then the event is considered to be new/unique and a new event entry is created in etcd:
* The usual POST/create event API is called to create a new event entry in etcd.
* An entry for the event is also added to the previously generated events cache.
## Issues/Risks
* Compression is not guaranteed, because each component keeps track of event history in memory
* An application restart causes event history to be cleared, meaning event history is not preserved across application restarts and compression will not occur across component restarts.
* Because an LRU cache is used to keep track of previously generated events, if too many unique events are generated, old events will be evicted from the cache, so events will only be compressed until they age out of the events cache, at which point any new instance of the event will cause a new entry to be created in etcd.
## Example
Sample kubectl output
```
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE
Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet.
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet.
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet.
Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-2.c.saad-dev-vms.internal} Starting kubelet.
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods
Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods
Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest"
Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal
```
This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries.
## Related Pull Requests/Issues
* Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events
* PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API
* PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event
* PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4306): Compress recurring events in to a single event to optimize etcd storage
* PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/event_compression.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/event_compression.md?pixel)]()

View File

@ -0,0 +1,391 @@
# Variable expansion in pod command, args, and env
## Abstract
A proposal for the expansion of environment variables using a simple `$(var)` syntax.
## Motivation
It is extremely common for users to need to compose environment variables or pass arguments to
their commands using the values of environment variables. Kubernetes should provide a facility for
the 80% cases in order to decrease coupling and the use of workarounds.
## Goals
1. Define the syntax format
2. Define the scoping and ordering of substitutions
3. Define the behavior for unmatched variables
4. Define the behavior for unexpected/malformed input
## Constraints and Assumptions
* This design should describe the simplest possible syntax to accomplish the use-cases
* Expansion syntax will not support more complicated shell-like behaviors such as default values
(viz: `$(VARIABLE_NAME:"default")`), inline substitution, etc.
## Use Cases
1. As a user, I want to compose new environment variables for a container using a substitution
syntax to reference other variables in the container's environment and service environment
variables
1. As a user, I want to substitute environment variables into a container's command
1. As a user, I want to do the above without requiring the container's image to have a shell
1. As a user, I want to be able to specify a default value for a service variable which may
not exist
1. As a user, I want to see an event associated with the pod if an expansion fails (ie, references
variable names that cannot be expanded)
### Use Case: Composition of environment variables
Currently, containers are injected with docker-style environment variables for the services in
their pod's namespace. There are several variables for each service, but users routinely need
to compose URLs based on these variables because there is not a variable for the exact format
they need. Users should be able to build new environment variables with the exact format they need.
Eventually, it should also be possible to turn off the automatic injection of the docker-style
variables into pods and let the users consume the exact information they need via the downward API
and composition.
#### Expanding expanded variables
It should be possible to reference an variable which is itself the result of an expansion, if the
referenced variable is declared in the container's environment prior to the one referencing it.
Put another way -- a container's environment is expanded in order, and expanded variables are
available to subsequent expansions.
### Use Case: Variable expansion in command
Users frequently need to pass the values of environment variables to a container's command.
Currently, Kubernetes does not perform any expansion of varibles. The workaround is to invoke a
shell in the container's command and have the shell perform the substitution, or to write a wrapper
script that sets up the environment and runs the command. This has a number of drawbacks:
1. Solutions that require a shell are unfriendly to images that do not contain a shell
2. Wrapper scripts make it harder to use images as base images
3. Wrapper scripts increase coupling to kubernetes
Users should be able to do the 80% case of variable expansion in command without writing a wrapper
script or adding a shell invocation to their containers' commands.
### Use Case: Images without shells
The current workaround for variable expansion in a container's command requires the container's
image to have a shell. This is unfriendly to images that do not contain a shell (`scratch` images,
for example). Users should be able to perform the other use-cases in this design without regard to
the content of their images.
### Use Case: See an event for incomplete expansions
It is possible that a container with incorrect variable values or command line may continue to run
for a long period of time, and that the end-user would have no visual or obvious warning of the
incorrect configuration. If the kubelet creates an event when an expansion references a variable
that cannot be expanded, it will help users quickly detect problems with expansions.
## Design Considerations
### What features should be supported?
In order to limit complexity, we want to provide the right amount of functionality so that the 80%
cases can be realized and nothing more. We felt that the essentials boiled down to:
1. Ability to perform direct expansion of variables in a string
2. Ability to specify default values via a prioritized mapping function but without support for
defaults as a syntax-level feature
### What should the syntax be?
The exact syntax for variable expansion has a large impact on how users perceive and relate to the
feature. We considered implementing a very restrictive subset of the shell `${var}` syntax. This
syntax is an attractive option on some level, because many people are familiar with it. However,
this syntax also has a large number of lesser known features such as the ability to provide
default values for unset variables, perform inline substitution, etc.
In the interest of preventing conflation of the expansion feature in Kubernetes with the shell
feature, we chose a different syntax similar to the one in Makefiles, `$(var)`. We also chose not
to support the bar `$var` format, since it is not required to implement the required use-cases.
Nested references, ie, variable expansion within variable names, are not supported.
#### How should unmatched references be treated?
Ideally, it should be extremely clear when a variable reference couldn't be expanded. We decided
the best experience for unmatched variable references would be to have the entire reference, syntax
included, show up in the output. As an example, if the reference `$(VARIABLE_NAME)` cannot be
expanded, then `$(VARIABLE_NAME)` should be present in the output.
#### Escaping the operator
Although the `$(var)` syntax does overlap with the `$(command)` form of command substitution
supported by many shells, because unexpanded variables are present verbatim in the output, we
expect this will not present a problem to many users. If there is a collision between a varible
name and command substitution syntax, the syntax can be escaped with the form `$$(VARIABLE_NAME)`,
which will evaluate to `$(VARIABLE_NAME)` whether `VARIABLE_NAME` can be expanded or not.
## Design
This design encompasses the variable expansion syntax and specification and the changes needed to
incorporate the expansion feature into the container's environment and command.
### Syntax and expansion mechanics
This section describes the expansion syntax, evaluation of variable values, and how unexpected or
malformed inputs are handled.
#### Syntax
The inputs to the expansion feature are:
1. A utf-8 string (the input string) which may contain variable references
2. A function (the mapping function) that maps the name of a variable to the variable's value, of
type `func(string) string`
Variable references in the input string are indicated exclusively with the syntax
`$(<variable-name>)`. The syntax tokens are:
- `$`: the operator
- `(`: the reference opener
- `)`: the reference closer
The operator has no meaning unless accompanied by the reference opener and closer tokens. The
operator can be escaped using `$$`. One literal `$` will be emitted for each `$$` in the input.
The reference opener and closer characters have no meaning when not part of a variable reference.
If a variable reference is malformed, viz: `$(VARIABLE_NAME` without a closing expression, the
operator and expression opening characters are treated as ordinary characters without special
meanings.
#### Scope and ordering of substitutions
The scope in which variable references are expanded is defined by the mapping function. Within the
mapping function, any arbitrary strategy may be used to determine the value of a variable name.
The most basic implementation of a mapping function is to use a `map[string]string` to lookup the
value of a variable.
In order to support default values for variables like service variables presented by the kubelet,
which may not be bound because the service that provides them does not yet exist, there should be a
mapping function that uses a list of `map[string]string` like:
```go
func MakeMappingFunc(maps ...map[string]string) func(string) string {
return func(input string) string {
for _, context := range maps {
val, ok := context[input]
if ok {
return val
}
}
return ""
}
}
// elsewhere
containerEnv := map[string]string{
"FOO": "BAR",
"ZOO": "ZAB",
"SERVICE2_HOST": "some-host",
}
serviceEnv := map[string]string{
"SERVICE_HOST": "another-host",
"SERVICE_PORT": "8083",
}
// single-map variation
mapping := MakeMappingFunc(containerEnv)
// default variables not found in serviceEnv
mappingWithDefaults := MakeMappingFunc(serviceEnv, containerEnv)
```
### Implementation changes
The necessary changes to implement this functionality are:
1. Add a new interface, `ObjectEventRecorder`, which is like the `EventRecorder` interface, but
scoped to a single object, and a function that returns an `ObjectEventRecorder` given an
`ObjectReference` and an `EventRecorder`
2. Introduce `third_party/golang/expansion` package that provides:
1. An `Expand(string, func(string) string) string` function
2. A `MappingFuncFor(ObjectEventRecorder, ...map[string]string) string` function
3. Make the kubelet expand environment correctly
4. Make the kubelet expand command correctly
#### Event Recording
In order to provide an event when an expansion references undefined variables, the mapping function
must be able to create an event. In order to facilitate this, we should create a new interface in
the `api/client/record` package which is similar to `EventRecorder`, but scoped to a single object:
```go
// ObjectEventRecorder knows how to record events about a single object.
type ObjectEventRecorder interface {
// Event constructs an event from the given information and puts it in the queue for sending.
// 'reason' is the reason this event is generated. 'reason' should be short and unique; it will
// be used to automate handling of events, so imagine people writing switch statements to
// handle them. You want to make that easy.
// 'message' is intended to be human readable.
//
// The resulting event will be created in the same namespace as the reference object.
Event(reason, message string)
// Eventf is just like Event, but with Sprintf for the message field.
Eventf(reason, messageFmt string, args ...interface{})
// PastEventf is just like Eventf, but with an option to specify the event's 'timestamp' field.
PastEventf(timestamp util.Time, reason, messageFmt string, args ...interface{})
}
```
There should also be a function that can construct an `ObjectEventRecorder` from a `runtime.Object`
and an `EventRecorder`:
```go
type objectRecorderImpl struct {
object runtime.Object
recorder EventRecorder
}
func (r *objectRecorderImpl) Event(reason, message string) {
r.recorder.Event(r.object, reason, message)
}
func ObjectEventRecorderFor(object runtime.Object, recorder EventRecorder) ObjectEventRecorder {
return &objectRecorderImpl{object, recorder}
}
```
#### Expansion package
The expansion package should provide two methods:
```go
// MappingFuncFor returns a mapping function for use with Expand that
// implements the expansion semantics defined in the expansion spec; it
// returns the input string wrapped in the expansion syntax if no mapping
// for the input is found. If no expansion is found for a key, an event
// is raised on the given recorder.
func MappingFuncFor(recorder record.ObjectEventRecorder, context ...map[string]string) func(string) string {
// ...
}
// Expand replaces variable references in the input string according to
// the expansion spec using the given mapping function to resolve the
// values of variables.
func Expand(input string, mapping func(string) string) string {
// ...
}
```
#### Kubelet changes
The Kubelet should be made to correctly expand variables references in a container's environment,
command, and args. Changes will need to be made to:
1. The `makeEnvironmentVariables` function in the kubelet; this is used by
`GenerateRunContainerOptions`, which is used by both the docker and rkt container runtimes
2. The docker manager `setEntrypointAndCommand` func has to be changed to perform variable
expansion
3. The rkt runtime should be made to support expansion in command and args when support for it is
implemented
### Examples
#### Inputs and outputs
These examples are in the context of the mapping:
| Name | Value |
|-------------|------------|
| `VAR_A` | `"A"` |
| `VAR_B` | `"B"` |
| `VAR_C` | `"C"` |
| `VAR_REF` | `$(VAR_A)` |
| `VAR_EMPTY` | `""` |
No other variables are defined.
| Input | Result |
|--------------------------------|----------------------------|
| `"$(VAR_A)"` | `"A"` |
| `"___$(VAR_B)___"` | `"___B___"` |
| `"___$(VAR_C)"` | `"___C"` |
| `"$(VAR_A)-$(VAR_A)"` | `"A-A"` |
| `"$(VAR_A)-1"` | `"A-1"` |
| `"$(VAR_A)_$(VAR_B)_$(VAR_C)"` | `"A_B_C"` |
| `"$$(VAR_B)_$(VAR_A)"` | `"$(VAR_B)_A"` |
| `"$$(VAR_A)_$$(VAR_B)"` | `"$(VAR_A)_$(VAR_B)"` |
| `"f000-$$VAR_A"` | `"f000-$VAR_A"` |
| `"foo\\$(VAR_C)bar"` | `"foo\Cbar"` |
| `"foo\\\\$(VAR_C)bar"` | `"foo\\Cbar"` |
| `"foo\\\\\\\\$(VAR_A)bar"` | `"foo\\\\Abar"` |
| `"$(VAR_A$(VAR_B))"` | `"$(VAR_A$(VAR_B))"` |
| `"$(VAR_A$(VAR_B)"` | `"$(VAR_A$(VAR_B)"` |
| `"$(VAR_REF)"` | `"$(VAR_A)"` |
| `"%%$(VAR_REF)--$(VAR_REF)%%"` | `"%%$(VAR_A)--$(VAR_A)%%"` |
| `"foo$(VAR_EMPTY)bar"` | `"foobar"` |
| `"foo$(VAR_Awhoops!"` | `"foo$(VAR_Awhoops!"` |
| `"f00__(VAR_A)__"` | `"f00__(VAR_A)__"` |
| `"$?_boo_$!"` | `"$?_boo_$!"` |
| `"$VAR_A"` | `"$VAR_A"` |
| `"$(VAR_DNE)"` | `"$(VAR_DNE)"` |
| `"$$$$$$(BIG_MONEY)"` | `"$$$(BIG_MONEY)"` |
| `"$$$$$$(VAR_A)"` | `"$$$(VAR_A)"` |
| `"$$$$$$$(GOOD_ODDS)"` | `"$$$$(GOOD_ODDS)"` |
| `"$$$$$$$(VAR_A)"` | `"$$$A"` |
| `"$VAR_A)"` | `"$VAR_A)"` |
| `"${VAR_A}"` | `"${VAR_A}"` |
| `"$(VAR_B)_______$(A"` | `"B_______$(A"` |
| `"$(VAR_C)_______$("` | `"C_______$("` |
| `"$(VAR_A)foobarzab$"` | `"Afoobarzab$"` |
| `"foo-\\$(VAR_A"` | `"foo-\$(VAR_A"` |
| `"--$($($($($--"` | `"--$($($($($--"` |
| `"$($($($($--foo$("` | `"$($($($($--foo$("` |
| `"foo0--$($($($("` | `"foo0--$($($($("` |
| `"$(foo$$var)` | `$(foo$$var)` |
#### In a pod: building a URL
Notice the `$(var)` syntax.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: expansion-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh", "-c", "env" ]
env:
- name: PUBLIC_URL
value: "http://$(GITSERVER_SERVICE_HOST):$(GITSERVER_SERVICE_PORT)"
restartPolicy: Never
```
#### In a pod: building a URL using downward API
```yaml
apiVersion: v1
kind: Pod
metadata:
name: expansion-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh", "-c", "env" ]
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: "metadata.namespace"
- name: PUBLIC_URL
value: "http://gitserver.$(POD_NAMESPACE):$(SERVICE_PORT)"
restartPolicy: Never
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/expansion.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/expansion.md?pixel)]()

View File

@ -0,0 +1,96 @@
# Identifiers and Names in Kubernetes
A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199).
## Definitions
UID
: A non-empty, opaque, system-generated value guaranteed to be unique in time and space; intended to distinguish between historical occurrences of similar entities.
Name
: A non-empty string guaranteed to be unique within a given scope at a particular time; used in resource URLs; provided by clients at creation time and encouraged to be human friendly; intended to facilitate creation idempotence and space-uniqueness of singleton objects, distinguish distinct entities, and reference particular entities across operations.
[rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) label (DNS_LABEL)
: An alphanumeric (a-z, and 0-9) string, with a maximum length of 63 characters, with the '-' character allowed anywhere except the first or last character, suitable for use as a hostname or segment in a domain name
[rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) subdomain (DNS_SUBDOMAIN)
: One or more lowercase rfc1035/rfc1123 labels separated by '.' with a maximum length of 253 characters
[rfc4122](http://www.ietf.org/rfc/rfc4122.txt) universally unique identifier (UUID)
: A 128 bit generated value that is extremely unlikely to collide across time and space and requires no central coordination
## Objectives for names and UIDs
1. Uniquely identify (via a UID) an object across space and time
2. Uniquely name (via a name) an object across space
3. Provide human-friendly names in API operations and/or configuration files
4. Allow idempotent creation of API resources (#148) and enforcement of space-uniqueness of singleton objects
5. Allow DNS names to be automatically generated for some objects
## General design
1. When an object is created via an API, a Name string (a DNS_SUBDOMAIN) must be specified. Name must be non-empty and unique within the apiserver. This enables idempotent and space-unique creation operations. Parts of the system (e.g. replication controller) may join strings (e.g. a base name and a random suffix) to create a unique Name. For situations where generating a name is impractical, some or all objects may support a param to auto-generate a name. Generating random names will defeat idempotency.
* Examples: "guestbook.user", "backend-x4eb1"
2. When an object is created via an API, a Namespace string (a DNS_SUBDOMAIN? format TBD via #1114) may be specified. Depending on the API receiver, namespaces might be validated (e.g. apiserver might ensure that the namespace actually exists). If a namespace is not specified, one will be assigned by the API receiver. This assignment policy might vary across API receivers (e.g. apiserver might have a default, kubelet might generate something semi-random).
* Example: "api.k8s.example.com"
3. Upon acceptance of an object via an API, the object is assigned a UID (a UUID). UID must be non-empty and unique across space and time.
* Example: "01234567-89ab-cdef-0123-456789abcdef"
## Case study: Scheduling a pod
Pods can be placed onto a particular node in a number of ways. This case
study demonstrates how the above design can be applied to satisfy the
objectives.
### A pod scheduled by a user through the apiserver
1. A user submits a pod with Namespace="" and Name="guestbook" to the apiserver.
2. The apiserver validates the input.
1. A default Namespace is assigned.
2. The pod name must be space-unique within the Namespace.
3. Each container within the pod has a name which must be space-unique within the pod.
3. The pod is accepted.
1. A new UID is assigned.
4. The pod is bound to a node.
1. The kubelet on the node is passed the pod's UID, Namespace, and Name.
5. Kubelet validates the input.
6. Kubelet runs the pod.
1. Each container is started up with enough metadata to distinguish the pod from whence it came.
2. Each attempt to run a container is assigned a UID (a string) that is unique across time.
* This may correspond to Docker's container ID.
### A pod placed by a config file on the node
1. A config file is stored on the node, containing a pod with UID="", Namespace="", and Name="cadvisor".
2. Kubelet validates the input.
1. Since UID is not provided, kubelet generates one.
2. Since Namespace is not provided, kubelet generates one.
1. The generated namespace should be deterministic and cluster-unique for the source, such as a hash of the hostname and file path.
* E.g. Namespace="file-f4231812554558a718a01ca942782d81"
3. Kubelet runs the pod.
1. Each container is started up with enough metadata to distinguish the pod from whence it came.
2. Each attempt to run a container is assigned a UID (a string) that is unique across time.
1. This may correspond to Docker's container ID.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/identifiers.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/identifiers.md?pixel)]()

View File

@ -0,0 +1,340 @@
# Namespaces
## Abstract
A Namespace is a mechanism to partition resources created by users into
a logically named group.
## Motivation
A single cluster should be able to satisfy the needs of multiple user communities.
Each user community wants to be able to work in isolation from other communities.
Each user community has its own:
1. resources (pods, services, replication controllers, etc.)
2. policies (who can or cannot perform actions in their community)
3. constraints (this community is allowed this much quota, etc.)
A cluster operator may create a Namespace for each unique user community.
The Namespace provides a unique scope for:
1. named resources (to avoid basic naming collisions)
2. delegated management authority to trusted users
3. ability to limit community resource consumption
## Use cases
1. As a cluster operator, I want to support multiple user communities on a single cluster.
2. As a cluster operator, I want to delegate authority to partitions of the cluster to trusted users
in those communities.
3. As a cluster operator, I want to limit the amount of resources each community can consume in order
to limit the impact to other communities using the cluster.
4. As a cluster user, I want to interact with resources that are pertinent to my user community in
isolation of what other user communities are doing on the cluster.
## Design
### Data Model
A *Namespace* defines a logically named group for multiple *Kind*s of resources.
```
type Namespace struct {
TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
Spec NamespaceSpec `json:"spec,omitempty"`
Status NamespaceStatus `json:"status,omitempty"`
}
```
A *Namespace* name is a DNS compatible label.
A *Namespace* must exist prior to associating content with it.
A *Namespace* must not be deleted if there is content associated with it.
To associate a resource with a *Namespace* the following conditions must be satisfied:
1. The resource's *Kind* must be registered as having *RESTScopeNamespace* with the server
2. The resource's *TypeMeta.Namespace* field must have a value that references an existing *Namespace*
The *Name* of a resource associated with a *Namespace* is unique to that *Kind* in that *Namespace*.
It is intended to be used in resource URLs; provided by clients at creation time, and encouraged to be
human friendly; intended to facilitate idempotent creation, space-uniqueness of singleton objects,
distinguish distinct entities, and reference particular entities across operations.
### Authorization
A *Namespace* provides an authorization scope for accessing content associated with the *Namespace*.
See [Authorization plugins](../authorization.md)
### Limit Resource Consumption
A *Namespace* provides a scope to limit resource consumption.
A *LimitRange* defines min/max constraints on the amount of resources a single entity can consume in
a *Namespace*.
See [Admission control: Limit Range](admission_control_limit_range.md)
A *ResourceQuota* tracks aggregate usage of resources in the *Namespace* and allows cluster operators
to define *Hard* resource usage limits that a *Namespace* may consume.
See [Admission control: Resource Quota](admission_control_resource_quota.md)
### Finalizers
Upon creation of a *Namespace*, the creator may provide a list of *Finalizer* objects.
```
type FinalizerName string
// These are internal finalizers to Kubernetes, must be qualified name unless defined here
const (
FinalizerKubernetes FinalizerName = "kubernetes"
)
// NamespaceSpec describes the attributes on a Namespace
type NamespaceSpec struct {
// Finalizers is an opaque list of values that must be empty to permanently remove object from storage
Finalizers []FinalizerName
}
```
A *FinalizerName* is a qualified name.
The API Server enforces that a *Namespace* can only be deleted from storage if and only if
it's *Namespace.Spec.Finalizers* is empty.
A *finalize* operation is the only mechanism to modify the *Namespace.Spec.Finalizers* field post creation.
Each *Namespace* created has *kubernetes* as an item in its list of initial *Namespace.Spec.Finalizers*
set by default.
### Phases
A *Namespace* may exist in the following phases.
```
type NamespacePhase string
const(
NamespaceActive NamespacePhase = "Active"
NamespaceTerminating NamespaceTerminating = "Terminating"
)
type NamespaceStatus struct {
...
Phase NamespacePhase
}
```
A *Namespace* is in the **Active** phase if it does not have a *ObjectMeta.DeletionTimestamp*.
A *Namespace* is in the **Terminating** phase if it has a *ObjectMeta.DeletionTimestamp*.
**Active**
Upon creation, a *Namespace* goes in the *Active* phase. This means that content may be associated with
a namespace, and all normal interactions with the namespace are allowed to occur in the cluster.
If a DELETE request occurs for a *Namespace*, the *Namespace.ObjectMeta.DeletionTimestamp* is set
to the current server time. A *namespace controller* observes the change, and sets the *Namespace.Status.Phase*
to *Terminating*.
**Terminating**
A *namespace controller* watches for *Namespace* objects that have a *Namespace.ObjectMeta.DeletionTimestamp*
value set in order to know when to initiate graceful termination of the *Namespace* associated content that
are known to the cluster.
The *namespace controller* enumerates each known resource type in that namespace and deletes it one by one.
Admission control blocks creation of new resources in that namespace in order to prevent a race-condition
where the controller could believe all of a given resource type had been deleted from the namespace,
when in fact some other rogue client agent had created new objects. Using admission control in this
scenario allows each of registry implementations for the individual objects to not need to take into account Namespace life-cycle.
Once all objects known to the *namespace controller* have been deleted, the *namespace controller*
executes a *finalize* operation on the namespace that removes the *kubernetes* value from
the *Namespace.Spec.Finalizers* list.
If the *namespace controller* sees a *Namespace* whose *ObjectMeta.DeletionTimestamp* is set, and
whose *Namespace.Spec.Finalizers* list is empty, it will signal the server to permanently remove
the *Namespace* from storage by sending a final DELETE action to the API server.
### REST API
To interact with the Namespace API:
| Action | HTTP Verb | Path | Description |
| ------ | --------- | ---- | ----------- |
| CREATE | POST | /api/{version}/namespaces | Create a namespace |
| LIST | GET | /api/{version}/namespaces | List all namespaces |
| UPDATE | PUT | /api/{version}/namespaces/{namespace} | Update namespace {namespace} |
| DELETE | DELETE | /api/{version}/namespaces/{namespace} | Delete namespace {namespace} |
| FINALIZE | POST | /api/{version}/namespaces/{namespace}/finalize | Finalize namespace {namespace} |
| WATCH | GET | /api/{version}/watch/namespaces | Watch all namespaces |
This specification reserves the name *finalize* as a sub-resource to namespace.
As a consequence, it is invalid to have a *resourceType* managed by a namespace whose kind is *finalize*.
To interact with content associated with a Namespace:
| Action | HTTP Verb | Path | Description |
| ---- | ---- | ---- | ---- |
| CREATE | POST | /api/{version}/namespaces/{namespace}/{resourceType}/ | Create instance of {resourceType} in namespace {namespace} |
| GET | GET | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Get instance of {resourceType} in namespace {namespace} with {name} |
| UPDATE | PUT | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Update instance of {resourceType} in namespace {namespace} with {name} |
| DELETE | DELETE | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Delete instance of {resourceType} in namespace {namespace} with {name} |
| LIST | GET | /api/{version}/namespaces/{namespace}/{resourceType} | List instances of {resourceType} in namespace {namespace} |
| WATCH | GET | /api/{version}/watch/namespaces/{namespace}/{resourceType} | Watch for changes to a {resourceType} in namespace {namespace} |
| WATCH | GET | /api/{version}/watch/{resourceType} | Watch for changes to a {resourceType} across all namespaces |
| LIST | GET | /api/{version}/list/{resourceType} | List instances of {resourceType} across all namespaces |
The API server verifies the *Namespace* on resource creation matches the *{namespace}* on the path.
The API server will associate a resource with a *Namespace* if not populated by the end-user based on the *Namespace* context
of the incoming request. If the *Namespace* of the resource being created, or updated does not match the *Namespace* on the request,
then the API server will reject the request.
### Storage
A namespace provides a unique identifier space and therefore must be in the storage path of a resource.
In etcd, we want to continue to still support efficient WATCH across namespaces.
Resources that persist content in etcd will have storage paths as follows:
/{k8s_storage_prefix}/{resourceType}/{resource.Namespace}/{resource.Name}
This enables consumers to WATCH /registry/{resourceType} for changes across namespace of a particular {resourceType}.
### Kubelet
The kubelet will register pod's it sources from a file or http source with a namespace associated with the
*cluster-id*
### Example: OpenShift Origin managing a Kubernetes Namespace
In this example, we demonstrate how the design allows for agents built on-top of
Kubernetes that manage their own set of resource types associated with a *Namespace*
to take part in Namespace termination.
OpenShift creates a Namespace in Kubernetes
```
{
"apiVersion":"v1",
"kind": "Namespace",
"metadata": {
"name": "development",
},
"spec": {
"finalizers": ["openshift.com/origin", "kubernetes"],
},
"status": {
"phase": "Active",
},
"labels": {
"name": "development"
},
}
```
OpenShift then goes and creates a set of resources (pods, services, etc) associated
with the "development" namespace. It also creates its own set of resources in its
own storage associated with the "development" namespace unknown to Kubernetes.
User deletes the Namespace in Kubernetes, and Namespace now has following state:
```
{
"apiVersion":"v1",
"kind": "Namespace",
"metadata": {
"name": "development",
"deletionTimestamp": "..."
},
"spec": {
"finalizers": ["openshift.com/origin", "kubernetes"],
},
"status": {
"phase": "Terminating",
},
"labels": {
"name": "development"
},
}
```
The Kubernetes *namespace controller* observes the namespace has a *deletionTimestamp*
and begins to terminate all of the content in the namespace that it knows about. Upon
success, it executes a *finalize* action that modifies the *Namespace* by
removing *kubernetes* from the list of finalizers:
```
{
"apiVersion":"v1",
"kind": "Namespace",
"metadata": {
"name": "development",
"deletionTimestamp": "..."
},
"spec": {
"finalizers": ["openshift.com/origin"],
},
"status": {
"phase": "Terminating",
},
"labels": {
"name": "development"
},
}
```
OpenShift Origin has its own *namespace controller* that is observing cluster state, and
it observes the same namespace had a *deletionTimestamp* assigned to it. It too will go
and purge resources from its own storage that it manages associated with that namespace.
Upon completion, it executes a *finalize* action and removes the reference to "openshift.com/origin"
from the list of finalizers.
This results in the following state:
```
{
"apiVersion":"v1",
"kind": "Namespace",
"metadata": {
"name": "development",
"deletionTimestamp": "..."
},
"spec": {
"finalizers": [],
},
"status": {
"phase": "Terminating",
},
"labels": {
"name": "development"
},
}
```
At this point, the Kubernetes *namespace controller* in its sync loop will see that the namespace
has a deletion timestamp and that its list of finalizers is empty. As a result, it knows all
content associated from that namespace has been purged. It performs a final DELETE action
to remove that Namespace from the storage.
At this point, all content associated with that Namespace, and the Namespace itself are gone.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/namespaces.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/namespaces.md?pixel)]()

View File

@ -0,0 +1,114 @@
# Networking
## Model and motivation
Kubernetes deviates from the default Docker networking model. The goal is for each pod to have an IP in a flat shared networking namespace that has full communication with other physical computers and containers across the network. IP-per-pod creates a clean, backward-compatible model where pods can be treated much like VMs or physical hosts from the perspectives of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration.
OTOH, dynamic port allocation requires supporting both static ports (e.g., for externally accessible services) and dynamically allocated ports, requires partitioning centrally allocated and locally acquired dynamic ports, complicates scheduling (since ports are a scarce resource), is inconvenient for users, complicates application configuration, is plagued by port conflicts and reuse and exhaustion, requires non-standard approaches to naming (e.g., etcd rather than DNS), requires proxies and/or redirection for programs using standard naming/addressing mechanisms (e.g., web browsers), requires watching and cache invalidation for address/port changes for instances in addition to watching group membership changes, and obstructs container/pod migration (e.g., using CRIU). NAT introduces additional complexity by fragmenting the addressing space, which breaks self-registration mechanisms, among other problems.
With the IP-per-pod model, all user containers within a pod behave as if they are on the same host with regard to networking. They can all reach each others ports on localhost. Ports which are published to the host interface are done so in the normal Docker way. All containers in all pods can talk to all other containers in all other pods by their 10-dot addresses.
In addition to avoiding the aforementioned problems with dynamic port allocation, this approach reduces friction for applications moving from the world of uncontainerized apps on physical or virtual hosts to containers within pods. People running application stacks together on the same host have already figured out how to make ports not conflict (e.g., by configuring them through environment variables) and have arranged for clients to find them.
The approach does reduce isolation between containers within a pod -- ports could conflict, and there couldn't be private ports across containers within a pod, but applications requiring their own port spaces could just run as separate pods and processes requiring private communication could run within the same container. Besides, the premise of pods is that containers within a pod share some resources (volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. Additionally, the user can control what containers belong to the same pod whereas, in general, they don't control what pods land together on a host.
When any container calls SIOCGIFADDR, it sees the IP that any peer container would see them coming from -- each pod has its own IP address that other pods can know. By making IP addresses and ports the same within and outside the containers and pods, we create a NAT-less, flat address space. "ip addr show" should work as expected. This would enable all existing naming/discovery mechanisms to work out of the box, including self-registration mechanisms and applications that distribute IP addresses. (We should test that with etcd and perhaps one other option, such as Eureka (used by Acme Air) or Consul.) We should be optimizing for inter-pod network communication. Within a pod, containers are more likely to use communication through volumes (e.g., tmpfs) or IPC.
This is different from the standard Docker model. In that mode, each container gets an IP in the 172-dot space and would only see that 172-dot address from SIOCGIFADDR. If these containers connect to another container the peer would see the connect coming from a different IP than the container itself knows. In short - you can never self-register anything from a container, because a container can not be reached on its private IP.
An alternative we considered was an additional layer of addressing: pod-centric IP per container. Each container would have its own local IP address, visible only within that pod. This would perhaps make it easier for containerized applications to move from physical/virtual hosts to pods, but would be more complex to implement (e.g., requiring a bridge per pod, split-horizon/VP DNS) and to reason about, due to the additional layer of address translation, and would break self-registration and IP distribution mechanisms.
## Current implementation
For the Google Compute Engine cluster configuration scripts, [advanced routing](https://developers.google.com/compute/docs/networking#routing) is set up so that each VM has an extra 256 IP addresses that get routed to it. This is in addition to the 'main' IP address assigned to the VM that is NAT-ed for Internet access. The networking bridge (called `cbr0` to differentiate it from `docker0`) is set up outside of Docker proper and only does NAT for egress network traffic that isn't aimed at the virtual network.
Ports mapped in from the 'main IP' (and hence the internet if the right firewall rules are set up) are proxied in user mode by Docker. In the future, this should be done with `iptables` by either the Kubelet or Docker: [Issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15).
We start Docker with:
DOCKER_OPTS="--bridge cbr0 --iptables=false"
We set up this bridge on each node with SaltStack, in [container_bridge.py](cluster/saltbase/salt/_states/container_bridge.py).
cbr0:
container_bridge.ensure:
- cidr: {{ grains['cbr-cidr'] }}
...
grains:
roles:
- kubernetes-pool
cbr-cidr: $MINION_IP_RANGE
We make these addresses routable in GCE:
gcloud compute routes add "${MINION_NAMES[$i]}" \
--project "${PROJECT}" \
--destination-range "${MINION_IP_RANGES[$i]}" \
--network "${NETWORK}" \
--next-hop-instance "${MINION_NAMES[$i]}" \
--next-hop-instance-zone "${ZONE}" &
The minion IP ranges are /24s in the 10-dot space.
GCE itself does not know anything about these IPs, though.
These are not externally routable, though, so containers that need to communicate with the outside world need to use host networking. To set up an external IP that forwards to the VM, it will only forward to the VM's primary IP (which is assigned to no pod). So we use docker's -p flag to map published ports to the main interface. This has the side effect of disallowing two pods from exposing the same port. (More discussion on this in [Issue #390](https://github.com/GoogleCloudPlatform/kubernetes/issues/390).)
We create a container to use for the pod network namespace -- a single loopback device and a single veth device. All the user's containers get their network namespaces from this pod networking container.
Docker allocates IP addresses from a bridge we create on each node, using its “container” networking mode.
1. Create a normal (in the networking sense) container which uses a minimal image and runs a command that blocks forever. This is not a user-defined container, and gets a special well-known name.
- creates a new network namespace (netns) and loopback device
- creates a new pair of veth devices and binds them to the netns
- auto-assigns an IP from dockers IP range
2. Create the user containers and specify the name of the pod infra container as their “POD” argument. Docker finds the PID of the command running in the pod infra container and attaches to the netns and ipcns of that PID.
### Other networking implementation examples
With the primary aim of providing IP-per-pod-model, other implementations exist to serve the purpose outside of GCE.
- [OpenVSwitch with GRE/VxLAN](../ovs-networking.md)
- [Flannel](https://github.com/coreos/flannel#flannel)
## Challenges and future work
### Docker API
Right now, docker inspect doesn't show the networking configuration of the containers, since they derive it from another container. That information should be exposed somehow.
### External IP assignment
We want to be able to assign IP addresses externally from Docker ([Docker issue #6743](https://github.com/dotcloud/docker/issues/6743)) so that we don't need to statically allocate fixed-size IP ranges to each node, so that IP addresses can be made stable across pod infra container restarts ([Docker issue #2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate pod migration. Right now, if the pod infra container dies, all the user containers must be stopped and restarted because the netns of the pod infra container will change on restart, and any subsequent user container restart will join that new netns, thereby not being able to see its peers. Additionally, a change in IP address would encounter DNS caching/TTL problems. External IP assignment would also simplify DNS support (see below).
### Naming, discovery, and load balancing
In addition to enabling self-registration with 3rd-party discovery mechanisms, we'd like to setup DDNS automatically ([Issue #146](https://github.com/GoogleCloudPlatform/kubernetes/issues/146)). hostname, $HOSTNAME, etc. should return a name for the pod ([Issue #298](https://github.com/GoogleCloudPlatform/kubernetes/issues/298)), and gethostbyname should be able to resolve names of other pods. Probably we need to set up a DNS resolver to do the latter ([Docker issue #2267](https://github.com/dotcloud/docker/issues/2267)), so that we don't need to keep /etc/hosts files up to date dynamically.
[Service](http://docs.k8s.io/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service IP](http://docs.k8s.io/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service's IP in DNS, and for that to become the preferred resolution protocol.
We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), non-load-balanced services ([Issue #260](https://github.com/GoogleCloudPlatform/kubernetes/issues/260)), and other types of groups (worker pools, etc.). Providing the ability to Watch a label selector applied to pod addresses would enable efficient monitoring of group membership, which could be directly consumed or synced with a discovery mechanism. Event hooks ([Issue #140](https://github.com/GoogleCloudPlatform/kubernetes/issues/140)) for join/leave events would probably make this even easier.
### External routability
We want traffic between containers to use the pod IP addresses across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 -- not the "primary" host IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers).
We'd also like to make pods directly routable from the external internet. However, we can't yet support the extra container IPs that we've provisioned talking to the internet directly. So, we don't map external IPs to the container IPs. Instead, we solve that problem by having traffic that isn't to the internal network (! 10.0.0.0/8) get NATed through the primary host IP address so that it can get 1:1 NATed by the GCE networking when talking to the internet. Similarly, incoming traffic from the internet has to get NATed/proxied through the host IP.
So we end up with 3 cases:
1. Container -> Container or Container <-> VM. These should use 10. addresses directly and there should be no NAT.
2. Container -> Internet. These have to get mapped to the primary host IP so that GCE knows how to egress that traffic. There is actually 2 layers of NAT here: Container IP -> Internal Host IP -> External Host IP. The first level happens in the guest with IP tables and the second happens as part of GCE networking. The first one (Container IP -> internal host IP) does dynamic port allocation while the second maps ports 1:1.
3. Internet -> Container. This also has to go through the primary host IP and also has 2 levels of NAT, ideally. However, the path currently is a proxy with (External Host IP -> Internal Host IP -> Docker) -> (Docker -> Container IP). Once [issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15) is closed, it should be External Host IP -> Internal Host IP -> Container IP. But to get that second arrow we have to set up the port forwarding iptables rules per mapped port.
Another approach could be to create a new host interface alias for each pod, if we had a way to route an external IP to it. This would eliminate the scheduling constraints resulting from using the host's IP address.
### IPv6
IPv6 would be a nice option, also, but we can't depend on it yet. Docker support is in progress: [Docker issue #2974](https://github.com/dotcloud/docker/issues/2974), [Docker issue #6923](https://github.com/dotcloud/docker/issues/6923), [Docker issue #6975](https://github.com/dotcloud/docker/issues/6975). Additionally, direct ipv6 assignment to instances doesn't appear to be supported by major cloud providers (e.g., AWS EC2, GCE) yet. We'd happily take pull requests from people running Kubernetes on bare metal, though. :-)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/networking.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/networking.md?pixel)]()

View File

@ -0,0 +1,220 @@
# Persistent Storage
This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data.
### tl;dr
Two new API kinds:
A `PersistentVolume` (PV) is a storage resource provisioned by an administrator. It is analogous to a node.
A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to use in a pod. It is analogous to a pod.
One new system component:
`PersistentVolumeClaimBinder` is a singleton running in master that watches all PersistentVolumeClaims in the system and binds them to the closest matching available PersistentVolume. The volume manager watches the API for newly created volumes to manage.
One new volume:
`PersistentVolumeClaimVolumeSource` references the user's PVC in the same namespace. This volume finds the bound PV and mounts that volume for the pod. A `PersistentVolumeClaimVolumeSource` is, essentially, a wrapper around another type of volume that is owned by someone else (the system).
Kubernetes makes no guarantees at runtime that the underlying storage exists or is available. High availability is left to the storage provider.
### Goals
* Allow administrators to describe available storage
* Allow pod authors to discover and request persistent volumes to use with pods
* Enforce security through access control lists and securing storage to the same namespace as the pod volume
* Enforce quotas through admission control
* Enforce scheduler rules by resource counting
* Ensure developers can rely on storage being available without being closely bound to a particular disk, server, network, or storage device.
#### Describe available storage
Cluster administrators use the API to manage *PersistentVolumes*. A custom store ```NewPersistentVolumeOrderedIndex``` will index volumes by access modes and sort by storage capacity. The ```PersistentVolumeClaimBinder``` watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request.
PVs are system objects and, thus, have no namespace.
Many means of dynamic provisioning will be eventually be implemented for various storage types.
##### PersistentVolume API
| Action | HTTP Verb | Path | Description |
| ---- | ---- | ---- | ---- |
| CREATE | POST | /api/{version}/persistentvolumes/ | Create instance of PersistentVolume |
| GET | GET | /api/{version}persistentvolumes/{name} | Get instance of PersistentVolume with {name} |
| UPDATE | PUT | /api/{version}/persistentvolumes/{name} | Update instance of PersistentVolume with {name} |
| DELETE | DELETE | /api/{version}/persistentvolumes/{name} | Delete instance of PersistentVolume with {name} |
| LIST | GET | /api/{version}/persistentvolumes | List instances of PersistentVolume |
| WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume |
#### Request Storage
Kubernetes users request persistent storage for their pod by creating a ```PersistentVolumeClaim```. Their request for storage is described by their requirements for resources and mount capabilities.
Requests for volumes are bound to available volumes by the volume manager, if a suitable match is found. Requests for resources can go unfulfilled.
Users attach their claim to their pod using a new ```PersistentVolumeClaimVolumeSource``` volume source.
##### PersistentVolumeClaim API
| Action | HTTP Verb | Path | Description |
| ---- | ---- | ---- | ---- |
| CREATE | POST | /api/{version}/namespaces/{ns}/persistentvolumeclaims/ | Create instance of PersistentVolumeClaim in namespace {ns} |
| GET | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Get instance of PersistentVolumeClaim in namespace {ns} with {name} |
| UPDATE | PUT | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Update instance of PersistentVolumeClaim in namespace {ns} with {name} |
| DELETE | DELETE | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Delete instance of PersistentVolumeClaim in namespace {ns} with {name} |
| LIST | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims | List instances of PersistentVolumeClaim in namespace {ns} |
| WATCH | GET | /api/{version}/watch/namespaces/{ns}/persistentvolumeclaims | Watch for changes to PersistentVolumeClaim in namespace {ns} |
#### Scheduling constraints
Scheduling constraints are to be handled similar to pod resource constraints. Pods will need to be annotated or decorated with the number of resources it requires on a node. Similarly, a node will need to list how many it has used or available.
TBD
#### Events
The implementation of persistent storage will not require events to communicate to the user the state of their claim. The CLI for bound claims contains a reference to the backing persistent volume. This is always present in the API and CLI, making an event to communicate the same unnecessary.
Events that communicate the state of a mounted volume are left to the volume plugins.
### Example
#### Admin provisions storage
An administrator provisions storage by posting PVs to the API. Various way to automate this task can be scripted. Dynamic provisioning is a future feature that can maintain levels of PVs.
```
POST:
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv0001
spec:
capacity:
storage: 10
persistentDisk:
pdName: "abc123"
fsType: "ext4"
--------------------------------------------------
kubectl get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM
pv0001 map[] 10737418240 RWO Pending
```
#### Users request storage
A user requests storage by posting a PVC to the API. Their request contains the AccessModes they wish their volume to have and the minimum size needed.
The user must be within a namespace to create PVCs.
```
POST:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3
--------------------------------------------------
kubectl get pvc
NAME LABELS STATUS VOLUME
myclaim-1 map[] pending
```
#### Matching and binding
The ```PersistentVolumeClaimBinder``` attempts to find an available volume that most closely matches the user's request. If one exists, they are bound by putting a reference on the PV to the PVC. Requests can go unfulfilled if a suitable match is not found.
```
kubectl get pv
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM
pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e
kubectl get pvc
NAME LABELS STATUS VOLUME
myclaim-1 map[] Bound b16e91d6-c0ef-11e4-8be4-80e6500a981e
```
#### Claim usage
The claim holder can use their claim as a volume. The ```PersistentVolumeClaimVolumeSource``` knows to fetch the PV backing the claim and mount its volume for a pod.
The claim holder owns the claim and its data for as long as the claim exists. The pod using the claim can be deleted, but the claim remains in the user's namespace. It can be used again and again by many pods.
```
POST:
kind: Pod
apiVersion: v1
metadata:
name: mypod
spec:
containers:
- image: nginx
name: myfrontend
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
source:
persistentVolumeClaim:
accessMode: ReadWriteOnce
claimRef:
name: myclaim-1
```
#### Releasing a claim and Recycling a volume
When a claim holder is finished with their data, they can delete their claim.
```
kubectl delete pvc myclaim-1
```
The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'.
Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/persistent-storage.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/persistent-storage.md?pixel)]()

View File

@ -0,0 +1,61 @@
# Design Principles
Principles to follow when extending Kubernetes.
## API
See also the [API conventions](../api-conventions.md).
* All APIs should be declarative.
* API objects should be complementary and composable, not opaque wrappers.
* The control plane should be transparent -- there are no hidden internal APIs.
* The cost of API operations should be proportional to the number of objects intentionally operated upon. Therefore, common filtered lookups must be indexed. Beware of patterns of multiple API calls that would incur quadratic behavior.
* Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation.
* Cluster-wide invariants are difficult to enforce correctly. Try not to add them. If you must have them, don't enforce them atomically in master components, that is contention-prone and doesn't provide a recovery path in the case of a bug allowing the invariant to be violated. Instead, provide a series of checks to reduce the probability of a violation, and make every component involved able to recover from an invariant violation.
* Low-level APIs should be designed for control by higher-level systems. Higher-level APIs should be intent-oriented (think SLOs) rather than implementation-oriented (think control knobs).
## Control logic
* Functionality must be *level-based*, meaning the system must operate correctly given the desired state and the current/observed state, regardless of how many intermediate state updates may have been missed. Edge-triggered behavior must be just an optimization.
* Assume an open world: continually verify assumptions and gracefully adapt to external events and/or actors. Example: we allow users to kill pods under control of a replication controller; it just replaces them.
* Do not define comprehensive state machines for objects with behaviors associated with state transitions and/or "assumed" states that cannot be ascertained by observation.
* Don't assume a component's decisions will not be overridden or rejected, nor for the component to always understand why. For example, etcd may reject writes. Kubelet may reject pods. The scheduler may not be able to schedule pods. Retry, but back off and/or make alternative decisions.
* Components should be self-healing. For example, if you must keep some state (e.g., cache) the content needs to be periodically refreshed, so that if an item does get erroneously stored or a deletion event is missed etc, it will be soon fixed, ideally on timescales that are shorter than what will attract attention from humans.
* Component behavior should degrade gracefully. Prioritize actions so that the most important activities can continue to function even when overloaded and/or in states of partial failure.
## Architecture
* Only the apiserver should communicate with etcd/store, and not other components (scheduler, kubelet, etc.).
* Compromising a single node shouldn't compromise the cluster.
* Components should continue to do what they were last told in the absence of new instructions (e.g., due to network partition or component outage).
* All components should keep all relevant state in memory all the time. The apiserver should write through to etcd/store, other components should write through to the apiserver, and they should watch for updates made by other clients.
* Watch is preferred over polling.
## Extensibility
TODO: pluggability
## Bootstrapping
* [Self-hosting](https://github.com/GoogleCloudPlatform/kubernetes/issues/246) of all components is a goal.
* Minimize the number of dependencies, particularly those required for steady-state operation.
* Stratify the dependencies that remain via principled layering.
* Break any circular dependencies by converting hard dependencies to soft dependencies.
* Also accept that data from other components from another source, such as local files, which can then be manually populated at bootstrap time and then continuously updated once those other components are available.
* State should be rediscoverable and/or reconstructable.
* Make it easy to run temporary, bootstrap instances of all components in order to create the runtime state needed to run the components in the steady state; use a lock (master election for distributed components, file lock for local components like Kubelet) to coordinate handoff. We call this technique "pivoting".
* Have a solution to restart dead components. For distributed components, replication works well. For local components such as Kubelet, a process manager or even a simple shell loop works.
## Availability
TODO
## General principles
* [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/principles.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/principles.md?pixel)]()

View File

@ -0,0 +1,581 @@
## Abstract
A proposal for the distribution of secrets (passwords, keys, etc) to the Kubelet and to
containers inside Kubernetes using a custom volume type.
## Motivation
Secrets are needed in containers to access internal resources like the Kubernetes master or
external resources such as git repositories, databases, etc. Users may also want behaviors in the
kubelet that depend on secret data (credentials for image pull from a docker registry) associated
with pods.
Goals of this design:
1. Describe a secret resource
2. Define the various challenges attendant to managing secrets on the node
3. Define a mechanism for consuming secrets in containers without modification
## Constraints and Assumptions
* This design does not prescribe a method for storing secrets; storage of secrets should be
pluggable to accommodate different use-cases
* Encryption of secret data and node security are orthogonal concerns
* It is assumed that node and master are secure and that compromising their security could also
compromise secrets:
* If a node is compromised, the only secrets that could potentially be exposed should be the
secrets belonging to containers scheduled onto it
* If the master is compromised, all secrets in the cluster may be exposed
* Secret rotation is an orthogonal concern, but it should be facilitated by this proposal
* A user who can consume a secret in a container can know the value of the secret; secrets must
be provisioned judiciously
## Use Cases
1. As a user, I want to store secret artifacts for my applications and consume them securely in
containers, so that I can keep the configuration for my applications separate from the images
that use them:
1. As a cluster operator, I want to allow a pod to access the Kubernetes master using a custom
`.kubeconfig` file, so that I can securely reach the master
2. As a cluster operator, I want to allow a pod to access a Docker registry using credentials
from a `.dockercfg` file, so that containers can push images
3. As a cluster operator, I want to allow a pod to access a git repository using SSH keys,
so that I can push and fetch to and from the repository
2. As a user, I want to allow containers to consume supplemental information about services such
as username and password which should be kept secret, so that I can share secrets about a
service amongst the containers in my application securely
3. As a user, I want to associate a pod with a `ServiceAccount` that consumes a secret and have
the kubelet implement some reserved behaviors based on the types of secrets the service account
consumes:
1. Use credentials for a docker registry to pull the pod's docker image
2. Present kubernetes auth token to the pod or transparently decorate traffic between the pod
and master service
4. As a user, I want to be able to indicate that a secret expires and for that secret's value to
be rotated once it expires, so that the system can help me follow good practices
### Use-Case: Configuration artifacts
Many configuration files contain secrets intermixed with other configuration information. For
example, a user's application may contain a properties file than contains database credentials,
SaaS API tokens, etc. Users should be able to consume configuration artifacts in their containers
and be able to control the path on the container's filesystems where the artifact will be
presented.
### Use-Case: Metadata about services
Most pieces of information about how to use a service are secrets. For example, a service that
provides a MySQL database needs to provide the username, password, and database name to consumers
so that they can authenticate and use the correct database. Containers in pods consuming the MySQL
service would also consume the secrets associated with the MySQL service.
### Use-Case: Secrets associated with service accounts
[Service Accounts](http://docs.k8s.io/design/service_accounts.md) are proposed as a
mechanism to decouple capabilities and security contexts from individual human users. A
`ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is
associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and
other system components to take action based on the secret's type.
#### Example: service account consumes auth token secret
As an example, the service account proposal discusses service accounts consuming secrets which
contain kubernetes auth tokens. When a Kubelet starts a pod associated with a service account
which consumes this type of secret, the Kubelet may take a number of actions:
1. Expose the secret in a `.kubernetes_auth` file in a well-known location in the container's
file system
2. Configure that node's `kube-proxy` to decorate HTTP requests from that pod to the
`kubernetes-master` service with the auth token, e. g. by adding a header to the request
(see the [LOAS Daemon](https://github.com/GoogleCloudPlatform/kubernetes/issues/2209) proposal)
#### Example: service account consumes docker registry credentials
Another example use case is where a pod is associated with a secret containing docker registry
credentials. The Kubelet could use these credentials for the docker pull to retrieve the image.
### Use-Case: Secret expiry and rotation
Rotation is considered a good practice for many types of secret data. It should be possible to
express that a secret has an expiry date; this would make it possible to implement a system
component that could regenerate expired secrets. As an example, consider a component that rotates
expired secrets. The rotator could periodically regenerate the values for expired secrets of
common types and update their expiry dates.
## Deferral: Consuming secrets as environment variables
Some images will expect to receive configuration items as environment variables instead of files.
We should consider what the best way to allow this is; there are a few different options:
1. Force the user to adapt files into environment variables. Users can store secrets that need to
be presented as environment variables in a format that is easy to consume from a shell:
$ cat /etc/secrets/my-secret.txt
export MY_SECRET_ENV=MY_SECRET_VALUE
The user could `source` the file at `/etc/secrets/my-secret` prior to executing the command for
the image either inline in the command or in an init script,
2. Give secrets an attribute that allows users to express the intent that the platform should
generate the above syntax in the file used to present a secret. The user could consume these
files in the same manner as the above option.
3. Give secrets attributes that allow the user to express that the secret should be presented to
the container as an environment variable. The container's environment would contain the
desired values and the software in the container could use them without accommodation the
command or setup script.
For our initial work, we will treat all secrets as files to narrow the problem space. There will
be a future proposal that handles exposing secrets as environment variables.
## Flow analysis of secret data with respect to the API server
There are two fundamentally different use-cases for access to secrets:
1. CRUD operations on secrets by their owners
2. Read-only access to the secrets needed for a particular node by the kubelet
### Use-Case: CRUD operations by owners
In use cases for CRUD operations, the user experience for secrets should be no different than for
other API resources.
#### Data store backing the REST API
The data store backing the REST API should be pluggable because different cluster operators will
have different preferences for the central store of secret data. Some possibilities for storage:
1. An etcd collection alongside the storage for other API resources
2. A collocated [HSM](http://en.wikipedia.org/wiki/Hardware_security_module)
3. An external datastore such as an external etcd, RDBMS, etc.
#### Size limit for secrets
There should be a size limit for secrets in order to:
1. Prevent DOS attacks against the API server
2. Allow kubelet implementations that prevent secret data from touching the node's filesystem
The size limit should satisfy the following conditions:
1. Large enough to store common artifact types (encryption keypairs, certificates, small
configuration files)
2. Small enough to avoid large impact on node resource consumption (storage, RAM for tmpfs, etc)
To begin discussion, we propose an initial value for this size limit of **1MB**.
#### Other limitations on secrets
Defining a policy for limitations on how a secret may be referenced by another API resource and how
constraints should be applied throughout the cluster is tricky due to the number of variables
involved:
1. Should there be a maximum number of secrets a pod can reference via a volume?
2. Should there be a maximum number of secrets a service account can reference?
3. Should there be a total maximum number of secrets a pod can reference via its own spec and its
associated service account?
4. Should there be a total size limit on the amount of secret data consumed by a pod?
5. How will cluster operators want to be able to configure these limits?
6. How will these limits impact API server validations?
7. How will these limits affect scheduling?
For now, we will not implement validations around these limits. Cluster operators will decide how
much node storage is allocated to secrets. It will be the operator's responsibility to ensure that
the allocated storage is sufficient for the workload scheduled onto a node.
For now, kubelets will only attach secrets to api-sourced pods, and not file- or http-sourced
ones. Doing so would:
- confuse the secrets admission controller in the case of mirror pods.
- create an apiserver-liveness dependency -- avoiding this dependency is a main reason to use non-api-source pods.
### Use-Case: Kubelet read of secrets for node
The use-case where the kubelet reads secrets has several additional requirements:
1. Kubelets should only be able to receive secret data which is required by pods scheduled onto
the kubelet's node
2. Kubelets should have read-only access to secret data
3. Secret data should not be transmitted over the wire insecurely
4. Kubelets must ensure pods do not have access to each other's secrets
#### Read of secret data by the Kubelet
The Kubelet should only be allowed to read secrets which are consumed by pods scheduled onto that
Kubelet's node and their associated service accounts. Authorization of the Kubelet to read this
data would be delegated to an authorization plugin and associated policy rule.
#### Secret data on the node: data at rest
Consideration must be given to whether secret data should be allowed to be at rest on the node:
1. If secret data is not allowed to be at rest, the size of secret data becomes another draw on
the node's RAM - should it affect scheduling?
2. If secret data is allowed to be at rest, should it be encrypted?
1. If so, how should be this be done?
2. If not, what threats exist? What types of secret are appropriate to store this way?
For the sake of limiting complexity, we propose that initially secret data should not be allowed
to be at rest on a node; secret data should be stored on a node-level tmpfs filesystem. This
filesystem can be subdivided into directories for use by the kubelet and by the volume plugin.
#### Secret data on the node: resource consumption
The Kubelet will be responsible for creating the per-node tmpfs file system for secret storage.
It is hard to make a prescriptive declaration about how much storage is appropriate to reserve for
secrets because different installations will vary widely in available resources, desired pod to
node density, overcommit policy, and other operation dimensions. That being the case, we propose
for simplicity that the amount of secret storage be controlled by a new parameter to the kubelet
with a default value of **64MB**. It is the cluster operator's responsibility to handle choosing
the right storage size for their installation and configuring their Kubelets correctly.
Configuring each Kubelet is not the ideal story for operator experience; it is more intuitive that
the cluster-wide storage size be readable from a central configuration store like the one proposed
in [#1553](https://github.com/GoogleCloudPlatform/kubernetes/issues/1553). When such a store
exists, the Kubelet could be modified to read this configuration item from the store.
When the Kubelet is modified to advertise node resources (as proposed in
[#4441](https://github.com/GoogleCloudPlatform/kubernetes/issues/4441)), the capacity calculation
for available memory should factor in the potential size of the node-level tmpfs in order to avoid
memory overcommit on the node.
#### Secret data on the node: isolation
Every pod will have a [security context](http://docs.k8s.io/design/security_context.md).
Secret data on the node should be isolated according to the security context of the container. The
Kubelet volume plugin API will be changed so that a volume plugin receives the security context of
a volume along with the volume spec. This will allow volume plugins to implement setting the
security context of volumes they manage.
## Community work:
Several proposals / upstream patches are notable as background for this proposal:
1. [Docker vault proposal](https://github.com/docker/docker/issues/10310)
2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277)
3. [Kubernetes service account proposal](http://docs.k8s.io/design/service_accounts.md)
4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075)
5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697)
## Proposed Design
We propose a new `Secret` resource which is mounted into containers with a new volume type. Secret
volumes will be handled by a volume plugin that does the actual work of fetching the secret and
storing it. Secrets contain multiple pieces of data that are presented as different files within
the secret volume (example: SSH key pair).
In order to remove the burden from the end user in specifying every file that a secret consists of,
it should be possible to mount all files provided by a secret with a single ```VolumeMount``` entry
in the container specification.
### Secret API Resource
A new resource for secrets will be added to the API:
```go
type Secret struct {
TypeMeta
ObjectMeta
// Data contains the secret data. Each key must be a valid DNS_SUBDOMAIN.
// The serialized form of the secret data is a base64 encoded string,
// representing the arbitrary (possibly non-string) data value here.
Data map[string][]byte `json:"data,omitempty"`
// Used to facilitate programmatic handling of secret data.
Type SecretType `json:"type,omitempty"`
}
type SecretType string
const (
SecretTypeOpaque SecretType = "Opaque" // Opaque (arbitrary data; default)
SecretTypeKubernetesAuthToken SecretType = "KubernetesAuth" // Kubernetes auth token
SecretTypeDockerRegistryAuth SecretType = "DockerRegistryAuth" // Docker registry auth
// FUTURE: other type values
)
const MaxSecretSize = 1 * 1024 * 1024
```
A Secret can declare a type in order to provide type information to system components that work
with secrets. The default type is `opaque`, which represents arbitrary user-owned data.
Secrets are validated against `MaxSecretSize`. The keys in the `Data` field must be valid DNS
subdomains.
A new REST API and registry interface will be added to accompany the `Secret` resource. The
default implementation of the registry will store `Secret` information in etcd. Future registry
implementations could store the `TypeMeta` and `ObjectMeta` fields in etcd and store the secret
data in another data store entirely, or store the whole object in another data store.
#### Other validations related to secrets
Initially there will be no validations for the number of secrets a pod references, or the number of
secrets that can be associated with a service account. These may be added in the future as the
finer points of secrets and resource allocation are fleshed out.
### Secret Volume Source
A new `SecretSource` type of volume source will be added to the ```VolumeSource``` struct in the
API:
```go
type VolumeSource struct {
// Other fields omitted
// SecretSource represents a secret that should be presented in a volume
SecretSource *SecretSource `json:"secret"`
}
type SecretSource struct {
Target ObjectReference
}
```
Secret volume sources are validated to ensure that the specified object reference actually points
to an object of type `Secret`.
In the future, the `SecretSource` will be extended to allow:
1. Fine-grained control over which pieces of secret data are exposed in the volume
2. The paths and filenames for how secret data are exposed
### Secret Volume Plugin
A new Kubelet volume plugin will be added to handle volumes with a secret source. This plugin will
require access to the API server to retrieve secret data and therefore the volume `Host` interface
will have to change to expose a client interface:
```go
type Host interface {
// Other methods omitted
// GetKubeClient returns a client interface
GetKubeClient() client.Interface
}
```
The secret volume plugin will be responsible for:
1. Returning a `volume.Builder` implementation from `NewBuilder` that:
1. Retrieves the secret data for the volume from the API server
2. Places the secret data onto the container's filesystem
3. Sets the correct security attributes for the volume based on the pod's `SecurityContext`
2. Returning a `volume.Cleaner` implementation from `NewClear` that cleans the volume from the
container's filesystem
### Kubelet: Node-level secret storage
The Kubelet must be modified to accept a new parameter for the secret storage size and to create
a tmpfs file system of that size to store secret data. Rough accounting of specific changes:
1. The Kubelet should have a new field added called `secretStorageSize`; units are megabytes
2. `NewMainKubelet` should accept a value for secret storage size
3. The Kubelet server should have a new flag added for secret storage size
4. The Kubelet's `setupDataDirs` method should be changed to create the secret storage
### Kubelet: New behaviors for secrets associated with service accounts
For use-cases where the Kubelet's behavior is affected by the secrets associated with a pod's
`ServiceAccount`, the Kubelet will need to be changed. For example, if secrets of type
`docker-reg-auth` affect how the pod's images are pulled, the Kubelet will need to be changed
to accommodate this. Subsequent proposals can address this on a type-by-type basis.
## Examples
For clarity, let's examine some detailed examples of some common use-cases in terms of the
suggested changes. All of these examples are assumed to be created in a namespace called
`example`.
### Use-Case: Pod with ssh keys
To create a pod that uses an ssh key stored as a secret, we first need to create a secret:
```json
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "ssh-key-secret"
},
"data": {
"id-rsa": "dmFsdWUtMg0KDQo=",
"id-rsa.pub": "dmFsdWUtMQ0K"
}
}
```
**Note:** The serialized JSON and YAML values of secret data are encoded as
base64 strings. Newlines are not valid within these strings and must be
omitted.
Now we can create a pod which references the secret with the ssh key and consumes it in a volume:
```json
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "secret-test-pod",
"labels": {
"name": "secret-test"
}
},
"spec": {
"volumes": [
{
"name": "secret-volume",
"secret": {
"secretName": "ssh-key-secret"
}
}
],
"containers": [
{
"name": "ssh-test-container",
"image": "mySshImage",
"volumeMounts": [
{
"name": "secret-volume",
"readOnly": true,
"mountPath": "/etc/secret-volume"
}
]
}
]
}
}
```
When the container's command runs, the pieces of the key will be available in:
/etc/secret-volume/id-rsa.pub
/etc/secret-volume/id-rsa
The container is then free to use the secret data to establish an ssh connection.
### Use-Case: Pods with pod / test credentials
This example illustrates a pod which consumes a secret containing prod
credentials and another pod which consumes a secret with test environment
credentials.
The secrets:
```json
{
"apiVersion": "v1",
"kind": "List",
"items":
[{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "prod-db-secret"
},
"data": {
"password": "dmFsdWUtMg0KDQo=",
"username": "dmFsdWUtMQ0K"
}
},
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "test-db-secret"
},
"data": {
"password": "dmFsdWUtMg0KDQo=",
"username": "dmFsdWUtMQ0K"
}
}]
}
```
The pods:
```json
{
"apiVersion": "v1",
"kind": "List",
"items":
[{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "prod-db-client-pod",
"labels": {
"name": "prod-db-client"
}
},
"spec": {
"volumes": [
{
"name": "secret-volume",
"secret": {
"secretName": "prod-db-secret"
}
}
],
"containers": [
{
"name": "db-client-container",
"image": "myClientImage",
"volumeMounts": [
{
"name": "secret-volume",
"readOnly": true,
"mountPath": "/etc/secret-volume"
}
]
}
]
}
},
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "test-db-client-pod",
"labels": {
"name": "test-db-client"
}
},
"spec": {
"volumes": [
{
"name": "secret-volume",
"secret": {
"secretName": "test-db-secret"
}
}
],
"containers": [
{
"name": "db-client-container",
"image": "myClientImage",
"volumeMounts": [
{
"name": "secret-volume",
"readOnly": true,
"mountPath": "/etc/secret-volume"
}
]
}
]
}
}]
}
```
The specs for the two pods differ only in the value of the object referred to by the secret volume
source. Both containers will have the following files present on their filesystems:
/etc/secret-volume/username
/etc/secret-volume/password
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/secrets.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/secrets.md?pixel)]()

View File

@ -0,0 +1,123 @@
# Security in Kubernetes
Kubernetes should define a reasonable set of security best practices that allows processes to be isolated from each other, from the cluster infrastructure, and which preserves important boundaries between those who manage the cluster, and those who use the cluster.
While Kubernetes today is not primarily a multi-tenant system, the long term evolution of Kubernetes will increasingly rely on proper boundaries between users and administrators. The code running on the cluster must be appropriately isolated and secured to prevent malicious parties from affecting the entire cluster.
## High Level Goals
1. Ensure a clear isolation between the container and the underlying host it runs on
2. Limit the ability of the container to negatively impact the infrastructure or other containers
3. [Principle of Least Privilege](http://en.wikipedia.org/wiki/Principle_of_least_privilege) - ensure components are only authorized to perform the actions they need, and limit the scope of a compromise by limiting the capabilities of individual components
4. Reduce the number of systems that have to be hardened and secured by defining clear boundaries between components
5. Allow users of the system to be cleanly separated from administrators
6. Allow administrative functions to be delegated to users where necessary
7. Allow applications to be run on the cluster that have "secret" data (keys, certs, passwords) which is properly abstracted from "public" data.
## Use cases
### Roles:
We define "user" as a unique identity accessing the Kubernetes API server, which may be a human or an automated process. Human users fall into the following categories:
1. k8s admin - administers a kubernetes cluster and has access to the undelying components of the system
2. k8s project administrator - administrates the security of a small subset of the cluster
3. k8s developer - launches pods on a kubernetes cluster and consumes cluster resources
Automated process users fall into the following categories:
1. k8s container user - a user that processes running inside a container (on the cluster) can use to access other cluster resources indepedent of the human users attached to a project
2. k8s infrastructure user - the user that kubernetes infrastructure components use to perform cluster functions with clearly defined roles
### Description of roles:
* Developers:
* write pod specs.
* making some of their own images, and using some "community" docker images
* know which pods need to talk to which other pods
* decide which pods should share files with other pods, and which should not.
* reason about application level security, such as containing the effects of a local-file-read exploit in a webserver pod.
* do not often reason about operating system or organizational security.
* are not necessarily comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc.
* Project Admins:
* allocate identity and roles within a namespace
* reason about organizational security within a namespace
* don't give a developer permissions that are not needed for role.
* protect files on shared storage from unnecessary cross-team access
* are less focused about application security
* Administrators:
* are less focused on application security. Focused on operating system security.
* protect the node from bad actors in containers, and properly-configured innocent containers from bad actors in other containers.
* comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc.
* decides who can use which Linux Capabilities, run privileged containers, use hostDir, etc.
* e.g. a team that manages Ceph or a mysql server might be trusted to have raw access to storage devices in some organizations, but teams that develop the applications at higher layers would not.
## Proposed Design
A pod runs in a *security context* under a *service account* that is defined by an administrator or project administrator, and the *secrets* a pod has access to is limited by that *service account*.
1. The API should authenticate and authorize user actions [authn and authz](http://docs.k8s.io/design/access.md)
2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API.
3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd)
4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](http://docs.k8s.io/design/service_accounts.md)
1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption
2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk
3. When containers run with a service account, the user that created / triggered the service account behavior must be associated with the container's action
5. When container processes run on the cluster, they should run in a [security context](http://docs.k8s.io/design/security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions.
1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID
2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID
3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions
4. Project administrators should be able to run pods within a namespace under different security contexts, and developers must be able to specify which of the available security contexts they may use
5. Developers should be able to run their own images or images from the community and expect those images to run correctly
6. Developers may need to ensure their images work within higher security requirements specified by administrators
7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met.
8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes
6. Developers should be able to define [secrets](http://docs.k8s.io/design/secrets.md) that are automatically added to the containers when pods are run
1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples:
1. An SSH private key for git cloning remote data
2. A client certificate for accessing a remote system
3. A private key and certificate for a web server
4. A .kubeconfig file with embedded cert / token data for accessing the Kubernetes master
5. A .dockercfg file for pulling images from a protected registry
2. Developers should be able to define the pod spec so that a secret lands in a specific location
3. Project administrators should be able to limit developers within a namespace from viewing or modifying secrets (anyone who can launch an arbitrary pod can view secrets)
4. Secrets are generally not copied from one namespace to another when a developer's application definitions are copied
### Related design discussion
* Authorization and authentication http://docs.k8s.io/design/access.md
* Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030
* Docker secrets https://github.com/docker/docker/pull/6697
* Docker vault https://github.com/docker/docker/issues/10310
* Service Accounts: http://docs.k8s.io/design/service_accounts.md
* Secret volumes https://github.com/GoogleCloudPlatform/kubernetes/4126
## Specific Design Points
### TODO: authorization, authentication
### Isolate the data store from the nodes and supporting infrastructure
Access to the central data store (etcd) in Kubernetes allows an attacker to run arbitrary containers on hosts, to gain access to any protected information stored in either volumes or in pods (such as access tokens or shared secrets provided as environment variables), to intercept and redirect traffic from running services by inserting middlemen, or to simply delete the entire history of the custer.
As a general principle, access to the central data store should be restricted to the components that need full control over the system and which can apply appropriate authorization and authentication of change requests. In the future, etcd may offer granular access control, but that granularity will require an administrator to understand the schema of the data to properly apply security. An administrator must be able to properly secure Kubernetes at a policy level, rather than at an implementation level, and schema changes over time should not risk unintended security leaks.
Both the Kubelet and Kube Proxy need information related to their specific roles - for the Kubelet, the set of pods it should be running, and for the Proxy, the set of services and endpoints to load balance. The Kubelet also needs to provide information about running pods and historical termination data. The access pattern for both Kubelet and Proxy to load their configuration is an efficient "wait for changes" request over HTTP. It should be possible to limit the Kubelet and Proxy to only access the information they need to perform their roles and no more.
The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role. For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes.
The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a node in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/security.md?pixel)]()

View File

@ -0,0 +1,163 @@
# Security Contexts
## Abstract
A security context is a set of constraints that are applied to a container in order to achieve the following goals (from [security design](security.md)):
1. Ensure a clear isolation between container and the underlying host it runs on
2. Limit the ability of the container to negatively impact the infrastructure or other containers
## Background
The problem of securing containers in Kubernetes has come up [before](https://github.com/GoogleCloudPlatform/kubernetes/issues/398) and the potential problems with container security are [well known](http://opensource.com/business/14/7/docker-security-selinux). Although it is not possible to completely isolate Docker containers from their hosts, new features like [user namespaces](https://github.com/docker/libcontainer/pull/304) make it possible to greatly reduce the attack surface.
## Motivation
### Container isolation
In order to improve container isolation from host and other containers running on the host, containers should only be
granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration)
to the container process.
Support for user namespaces has recently been [merged](https://github.com/docker/libcontainer/pull/304) into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers.
### External integration with shared storage
In order to support external integration with shared storage, processes running in a Kubernetes cluster
should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established.
Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks.
## Constraints and Assumptions
* It is out of the scope of this document to prescribe a specific set
of constraints to isolate containers from their host. Different use cases need different
settings.
* The concept of a security context should not be tied to a particular security mechanism or platform
(ie. SELinux, AppArmor)
* Applying a different security context to a scope (namespace or pod) requires a solution such as the one proposed for
[service accounts](./service_accounts.md).
## Use Cases
In order of increasing complexity, following are example use cases that would
be addressed with security contexts:
1. Kubernetes is used to run a single cloud application. In order to protect
nodes from containers:
* All containers run as a single non-root user
* Privileged containers are disabled
* All containers run with a particular MCS label
* Kernel capabilities like CHOWN and MKNOD are removed from containers
2. Just like case #1, except that I have more than one application running on
the Kubernetes cluster.
* Each application is run in its own namespace to avoid name collisions
* For each application a different uid and MCS label is used
3. Kubernetes is used as the base for a PAAS with
multiple projects, each project represented by a namespace.
* Each namespace is associated with a range of uids/gids on the node that
are mapped to uids/gids on containers using linux user namespaces.
* Certain pods in each namespace have special privileges to perform system
actions such as talking back to the server for deployment, run docker
builds, etc.
* External NFS storage is assigned to each namespace and permissions set
using the range of uids/gids assigned to that namespace.
## Proposed Design
### Overview
A *security context* consists of a set of constraints that determine how a container
is secured before getting created and run. A security context resides on the container and represents the runtime parameters that will
be used to create and run the container via container APIs. A *security context provider* is passed to the Kubelet so it can have a chance
to mutate Docker API calls in order to apply the security context.
It is recommended that this design be implemented in two phases:
1. Implement the security context provider extension point in the Kubelet
so that a default security context can be applied on container run and creation.
2. Implement a security context structure that is part of a service account. The
default context provider can then be used to apply a security context based
on the service account associated with the pod.
### Security Context Provider
The Kubelet will have an interface that points to a `SecurityContextProvider`. The `SecurityContextProvider` is invoked before creating and running a given container:
```go
type SecurityContextProvider interface {
// ModifyContainerConfig is called before the Docker createContainer call.
// The security context provider can make changes to the Config with which
// the container is created.
// An error is returned if it's not possible to secure the container as
// requested with a security context.
ModifyContainerConfig(pod *api.Pod, container *api.Container, config *docker.Config)
// ModifyHostConfig is called before the Docker runContainer call.
// The security context provider can make changes to the HostConfig, affecting
// security options, whether the container is privileged, volume binds, etc.
// An error is returned if it's not possible to secure the container as requested
// with a security context.
ModifyHostConfig(pod *api.Pod, container *api.Container, hostConfig *docker.HostConfig)
}
```
If the value of the SecurityContextProvider field on the Kubelet is nil, the kubelet will create and run the container as it does today.
### Security Context
A security context resides on the container and represents the runtime parameters that will
be used to create and run the container via container APIs. Following is an example of an initial implementation:
```go
type type Container struct {
... other fields omitted ...
// Optional: SecurityContext defines the security options the pod should be run with
SecurityContext *SecurityContext
}
// SecurityContext holds security configuration that will be applied to a container. SecurityContext
// contains duplication of some existing fields from the Container resource. These duplicate fields
// will be populated based on the Container configuration if they are not set. Defining them on
// both the Container AND the SecurityContext will result in an error.
type SecurityContext struct {
// Capabilities are the capabilities to add/drop when running the container
Capabilities *Capabilities
// Run the container in privileged mode
Privileged *bool
// SELinuxOptions are the labels to be applied to the container
// and volumes
SELinuxOptions *SELinuxOptions
// RunAsUser is the UID to run the entrypoint of the container process.
RunAsUser *int64
}
// SELinuxOptions are the labels to be applied to the container.
type SELinuxOptions struct {
// SELinux user label
User string
// SELinux role label
Role string
// SELinux type label
Type string
// SELinux level label.
Level string
}
```
### Admission
It is up to an admission plugin to determine if the security context is acceptable or not. At the
time of writing, the admission control plugin for security contexts will only allow a context that
has defined capabilities or privileged. Contexts that attempt to define a UID or SELinux options
will be denied by default. In the future the admission plugin will base this decision upon
configurable policies that reside within the [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security_context.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/security_context.md?pixel)]()

View File

@ -0,0 +1,170 @@
#Service Accounts
## Motivation
Processes in Pods may need to call the Kubernetes API. For example:
- scheduler
- replication controller
- minion controller
- a map-reduce type framework which has a controller that then tries to make a dynamically determined number of workers and watch them
- continuous build and push system
- monitoring system
They also may interact with services other than the Kubernetes API, such as:
- an image repository, such as docker -- both when the images are pulled to start the containers, and for writing
images in the case of pods that generate images.
- accessing other cloud services, such as blob storage, in the context of a larged, integrated, cloud offering (hosted
or private).
- accessing files in an NFS volume attached to the pod
## Design Overview
A service account binds together several things:
- a *name*, understood by users, and perhaps by peripheral systems, for an identity
- a *principal* that can be authenticated and [authorized](../authorization.md)
- a [security context](./security_context.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other
capabilities and controls on interaction with the file system and OS.
- a set of [secrets](./secrets.md), which a container may use to
access various networked resources.
## Design Discussion
A new object Kind is added:
```go
type ServiceAccount struct {
TypeMeta `json:",inline" yaml:",inline"`
ObjectMeta `json:"metadata,omitempty" yaml:"metadata,omitempty"`
username string
securityContext ObjectReference // (reference to a securityContext object)
secrets []ObjectReference // (references to secret objects
}
```
The name ServiceAccount is chosen because it is widely used already (e.g. by Kerberos and LDAP)
to refer to this type of account. Note that it has no relation to kubernetes Service objects.
The ServiceAccount object does not include any information that could not be defined separately:
- username can be defined however users are defined.
- securityContext and secrets are only referenced and are created using the REST API.
The purpose of the serviceAccount object is twofold:
- to bind usernames to securityContexts and secrets, so that the username can be used to refer succinctly
in contexts where explicitly naming securityContexts and secrets would be inconvenient
- to provide an interface to simplify allocation of new securityContexts and secrets.
These features are explained later.
### Names
From the standpoint of the Kubernetes API, a `user` is any principal which can authenticate to kubernetes API.
This includes a human running `kubectl` on her desktop and a container in a Pod on a Node making API calls.
There is already a notion of a username in kubernetes, which is populated into a request context after authentication.
However, there is no API object representing a user. While this may evolve, it is expected that in mature installations,
the canonical storage of user identifiers will be handled by a system external to kubernetes.
Kubernetes does not dictate how to divide up the space of user identifier strings. User names can be
simple Unix-style short usernames, (e.g. `alice`), or may be qualified to allow for federated identity (
`alice@example.com` vs `alice@example.org`.) Naming convention may distinguish service accounts from user
accounts (e.g. `alice@example.com` vs `build-service-account-a3b7f0@foo-namespace.service-accounts.example.com`),
but Kubernetes does not require this.
Kubernetes also does not require that there be a distinction between human and Pod users. It will be possible
to setup a cluster where Alice the human talks to the kubernetes API as username `alice` and starts pods that
also talk to the API as user `alice` and write files to NFS as user `alice`. But, this is not recommended.
Instead, it is recommended that Pods and Humans have distinct identities, and reference implementations will
make this distinction.
The distinction is useful for a number of reasons:
- the requirements for humans and automated processes are different:
- Humans need a wide range of capabilities to do their daily activities. Automated processes often have more narrowly-defined activities.
- Humans may better tolerate the exceptional conditions created by expiration of a token. Remembering to handle
this in a program is more annoying. So, either long-lasting credentials or automated rotation of credentials is
needed.
- A Human typically keeps credentials on a machine that is not part of the cluster and so not subject to automatic
management. A VM with a role/service-account can have its credentials automatically managed.
- the identity of a Pod cannot in general be mapped to a single human.
- If policy allows, it may be created by one human, and then updated by another, and another, until its behavior cannot be attributed to a single human.
**TODO**: consider getting rid of separate serviceAccount object and just rolling its parts into the SecurityContext or
Pod Object.
The `secrets` field is a list of references to /secret objects that an process started as that service account should
have access to to be able to assert that role.
The secrets are not inline with the serviceAccount object. This way, most or all users can have permission to `GET /serviceAccounts` so they can remind themselves
what serviceAccounts are available for use.
Nothing will prevent creation of a serviceAccount with two secrets of type `SecretTypeKubernetesAuth`, or secrets of two
different types. Kubelet and client libraries will have some behavior, TBD, to handle the case of multiple secrets of a
given type (pick first or provide all and try each in order, etc).
When a serviceAccount and a matching secret exist, then a `User.Info` for the serviceAccount and a `BearerToken` from the secret
are added to the map of tokens used by the authentication process in the apiserver, and similarly for other types. (We
might have some types that do not do anything on apiserver but just get pushed to the kubelet.)
### Pods
The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field. If this is unset, then a
default value is chosen. If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the
Service Account Finalizer (see below).
TBD: how policy limits which users can make pods with which service accounts.
### Authorization
Kubernetes API Authorization Policies refer to users. Pods created with a `Pods.Spec.ServiceAccountUsername` typically
get a `Secret` which allows them to authenticate to the Kubernetes APIserver as a particular user. So any
policy that is desired can be applied to them.
A higher level workflow is needed to coordinate creation of serviceAccounts, secrets and relevant policy objects.
Users are free to extend kubernetes to put this business logic wherever is convenient for them, though the
Service Account Finalizer is one place where this can happen (see below).
### Kubelet
The kubelet will treat as "not ready to run" (needing a finalizer to act on it) any Pod which has an empty
SecurityContext.
The kubelet will set a default, restrictive, security context for any pods created from non-Apiserver config
sources (http, file).
Kubelet watches apiserver for secrets which are needed by pods bound to it.
**TODO**: how to only let kubelet see secrets it needs to know.
### The service account finalizer
There are several ways to use Pods with SecurityContexts and Secrets.
One way is to explicitly specify the securityContext and all secrets of a Pod when the pod is initially created,
like this:
**TODO**: example of pod with explicit refs.
Another way is with the *Service Account Finalizer*, a plugin process which is optional, and which handles
business logic around service accounts.
The Service Account Finalizer watches Pods, Namespaces, and ServiceAccount definitions.
First, if it finds pods which have a `Pod.Spec.ServiceAccountUsername` but no `Pod.Spec.SecurityContext` set,
then it copies in the referenced securityContext and secrets references for the corresponding `serviceAccount`.
Second, if ServiceAccount definitions change, it may take some actions.
**TODO**: decide what actions it takes when a serviceAccount definition changes. Does it stop pods, or just
allow someone to list ones that out out of spec? In general, people may want to customize this?
Third, if a new namespace is created, it may create a new serviceAccount for that namespace. This may include
a new username (e.g. `NAMESPACE-default-service-account@serviceaccounts.$CLUSTERID.kubernetes.io`), a new
securityContext, a newly generated secret to authenticate that serviceAccount to the Kubernetes API, and default
policies for that service account.
**TODO**: more concrete example. What are typical default permissions for default service account (e.g. readonly access
to services in the same namespace and read-write access to events in that namespace?)
Finally, it may provide an interface to automate creation of new serviceAccounts. In that case, the user may want
to GET serviceAccounts to see what has been created.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/service_accounts.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/service_accounts.md?pixel)]()

View File

@ -0,0 +1,97 @@
## Simple rolling update
This is a lightweight design document for simple rolling update in ```kubectl```
Complete execution flow can be found [here](#execution-details).
### Lightweight rollout
Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1```
```kubectl rolling-update rc foo [foo-v2] --image=myimage:v2```
If the user doesn't specify a name for the 'next' controller, then the 'next' controller is renamed to
the name of the original controller.
Obviously there is a race here, where if you kill the client between delete foo, and creating the new version of 'foo' you might be surprised about what is there, but I think that's ok.
See [Recovery](#recovery) below
If the user does specify a name for the 'next' controller, then the 'next' controller is retained with its existing name,
and the old 'foo' controller is deleted. For the purposes of the rollout, we add a unique-ifying label ```kubernetes.io/deployment``` to both the ```foo``` and ```foo-next``` controllers.
The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag.
#### Recovery
If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out.
To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replicaController in the ```kubernetes.io/``` annotation namespace:
* ```desired-replicas``` The desired number of replicas for this controller (either N or zero)
* ```update-partner``` A pointer to the replicaiton controller resource that is the other half of this update (syntax ```<name>``` the namespace is assumed to be identical to the namespace of this replication controller.)
Recovery is achieved by issuing the same command again:
```
kubectl rolling-update rc foo [foo-v2] --image=myimage:v2
```
Whenever the rolling update command executes, the kubectl client looks for replication controllers called ```foo``` and ```foo-next```, if they exist, an attempt is
made to roll ```foo``` to ```foo-next```. If ```foo-next``` does not exist, then it is created, and the rollout is a new rollout. If ```foo``` doesn't exist, then
it is assumed that the rollout is nearly completed, and ```foo-next``` is renamed to ```foo```. Details of the execution flow are given below.
### Aborting a rollout
Abort is assumed to want to reverse a rollout in progress.
```kubectl rolling-update rc foo [foo-v2] --rollback```
This is really just semantic sugar for:
```kubectl rolling-update rc foo-v2 foo```
With the added detail that it moves the ```desired-replicas``` annotation from ```foo-v2``` to ```foo```
### Execution Details
For the purposes of this example, assume that we are rolling from ```foo``` to ```foo-next``` where the only change is an image update from `v1` to `v2`
If the user doesn't specify a ```foo-next``` name, then it is either discovered from the ```update-partner``` annotation on ```foo```. If that annotation doesn't exist,
then ```foo-next``` is synthesized using the pattern ```<controller-name>-<hash-of-next-controller-JSON>```
#### Initialization
* If ```foo``` and ```foo-next``` do not exist:
* Exit, and indicate an error to the user, that the specified controller doesn't exist.
* If ```foo``` exists, but ```foo-next``` does not:
* Create ```foo-next``` populate it with the ```v2``` image, set ```desired-replicas``` to ```foo.Spec.Replicas```
* Goto Rollout
* If ```foo-next``` exists, but ```foo``` does not:
* Assume that we are in the rename phase.
* Goto Rename
* If both ```foo``` and ```foo-next``` exist:
* Assume that we are in a partial rollout
* If ```foo-next``` is missing the ```desired-replicas``` annotation
* Populate the ```desired-replicas``` annotation to ```foo-next``` using the current size of ```foo```
* Goto Rollout
#### Rollout
* While size of ```foo-next``` < ```desired-replicas``` annotation on ```foo-next```
* increase size of ```foo-next```
* if size of ```foo``` > 0
decrease size of ```foo```
* Goto Rename
#### Rename
* delete ```foo```
* create ```foo``` that is identical to ```foo-next```
* delete ```foo-next```
#### Abort
* If ```foo-next``` doesn't exist
* Exit and indicate to the user that they may want to simply do a new rollout with the old version
* If ```foo``` doesn't exist
* Exit and indicate not found to the user
* Otherwise, ```foo-next``` and ```foo``` both exist
* Set ```desired-replicas``` annotation on ```foo``` to match the annotation on ```foo-next```
* Goto Rollout with ```foo``` and ```foo-next``` trading places.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/simple-rolling-update.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/design/simple-rolling-update.md?pixel)]()

View File

@ -0,0 +1,27 @@
# Developing Kubernetes
Docs in this directory relate to developing Kubernetes.
* **On Collaborative Development** ([collab.md](collab.md)): info on pull requests and code reviews.
* **Development Guide** ([development.md](development.md)): Setting up your environment tests.
* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests.
Here's how to run your tests many times.
* **GitHub Issues** ([issues.md](issues.md)): How incoming issues are reviewed and prioritized.
* **Logging Conventions** ([logging.md](logging.md)]: Glog levels.
* **Pull Request Process** ([pull-requests.md](pull-requests.md)): When and why pull requests are closed.
* **Releasing Kubernetes** ([releasing.md](releasing.md)): How to create a Kubernetes release (as in version)
and how the version information gets embedded into the built binaries.
* **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/README.md?pixel)]()

View File

@ -0,0 +1,348 @@
# So you want to change the API?
The Kubernetes API has two major components - the internal structures and
the versioned APIs. The versioned APIs are intended to be stable, while the
internal structures are implemented to best reflect the needs of the Kubernetes
code itself.
What this means for API changes is that you have to be somewhat thoughtful in
how you approach changes, and that you have to touch a number of pieces to make
a complete change. This document aims to guide you through the process, though
not all API changes will need all of these steps.
## Operational overview
It is important to have a high level understanding of the API system used in
Kubernetes in order to navigate the rest of this document.
As mentioned above, the internal representation of an API object is decoupled
from any one API version. This provides a lot of freedom to evolve the code,
but it requires robust infrastructure to convert between representations. There
are multiple steps in processing an API operation - even something as simple as
a GET involves a great deal of machinery.
The conversion process is logically a "star" with the internal form at the
center. Every versioned API can be converted to the internal form (and
vice-versa), but versioned APIs do not convert to other versioned APIs directly.
This sounds like a heavy process, but in reality we do not intend to keep more
than a small number of versions alive at once. While all of the Kubernetes code
operates on the internal structures, they are always converted to a versioned
form before being written to storage (disk or etcd) or being sent over a wire.
Clients should consume and operate on the versioned APIs exclusively.
To demonstrate the general process, here is a (hypothetical) example:
1. A user POSTs a `Pod` object to `/api/v7beta1/...`
2. The JSON is unmarshalled into a `v7beta1.Pod` structure
3. Default values are applied to the `v7beta1.Pod`
4. The `v7beta1.Pod` is converted to an `api.Pod` structure
5. The `api.Pod` is validated, and any errors are returned to the user
6. The `api.Pod` is converted to a `v6.Pod` (because v6 is the latest stable
version)
7. The `v6.Pod` is marshalled into JSON and written to etcd
Now that we have the `Pod` object stored, a user can GET that object in any
supported api version. For example:
1. A user GETs the `Pod` from `/api/v5/...`
2. The JSON is read from etcd and unmarshalled into a `v6.Pod` structure
3. Default values are applied to the `v6.Pod`
4. The `v6.Pod` is converted to an `api.Pod` structure
5. The `api.Pod` is converted to a `v5.Pod` structure
6. The `v5.Pod` is marshalled into JSON and sent to the user
The implication of this process is that API changes must be done carefully and
backward-compatibly.
## On compatibility
Before talking about how to make API changes, it is worthwhile to clarify what
we mean by API compatibility. An API change is considered backward-compatible
if it:
* adds new functionality that is not required for correct behavior
* does not change existing semantics
* does not change existing defaults
Put another way:
1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before
your change must work the same after your change.
2. Any API call that uses your change must not cause problems (e.g. crash or
degrade behavior) when issued against servers that do not include your change.
3. It must be possible to round-trip your change (convert to different API
versions and back) with no loss of information.
If your change does not meet these criteria, it is not considered strictly
compatible. There are times when this might be OK, but mostly we want changes
that meet this definition. If you think you need to break compatibility, you
should talk to the Kubernetes team first.
Let's consider some examples. In a hypothetical API (assume we're at version
v6), the `Frobber` struct looks something like this:
```go
// API v6.
type Frobber struct {
Height int `json:"height"`
Param string `json:"param"`
}
```
You want to add a new `Width` field. It is generally safe to add new fields
without changing the API version, so you can simply change it to:
```go
// Still API v6.
type Frobber struct {
Height int `json:"height"`
Width int `json:"width"`
Param string `json:"param"`
}
```
The onus is on you to define a sane default value for `Width` such that rule #1
above is true - API calls and stored objects that used to work must continue to
work.
For your next change you want to allow multiple `Param` values. You can not
simply change `Param string` to `Params []string` (without creating a whole new
API version) - that fails rules #1 and #2. You can instead do something like:
```go
// Still API v6, but kind of clumsy.
type Frobber struct {
Height int `json:"height"`
Width int `json:"width"`
Param string `json:"param"` // the first param
ExtraParams []string `json:"params"` // additional params
}
```
Now you can satisfy the rules: API calls that provide the old style `Param`
will still work, while servers that don't understand `ExtraParams` can ignore
it. This is somewhat unsatisfying as an API, but it is strictly compatible.
Part of the reason for versioning APIs and for using internal structs that are
distinct from any one version is to handle growth like this. The internal
representation can be implemented as:
```go
// Internal, soon to be v7beta1.
type Frobber struct {
Height int
Width int
Params []string
}
```
The code that converts to/from versioned APIs can decode this into the somewhat
uglier (but compatible!) structures. Eventually, a new API version, let's call
it v7beta1, will be forked and it can use the clean internal structure.
We've seen how to satisfy rules #1 and #2. Rule #3 means that you can not
extend one versioned API without also extending the others. For example, an
API call might POST an object in API v7beta1 format, which uses the cleaner
`Params` field, but the API server might store that object in trusty old v6
form (since v7beta1 is "beta"). When the user reads the object back in the
v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This
means that, even though it is ugly, a compatible change must be made to the v6
API.
As another interesting example, enumerated values provide a unique challenge.
Adding a new value to an enumerated set is *not* a compatible change. Clients
which assume they know how to handle all possible values of a given field will
not be able to handle the new values. However, removing value from an
enumerated set *can* be a compatible change, if handled properly (treat the
removed value as deprecated but allowed).
## Changing versioned APIs
For most changes, you will probably find it easiest to change the versioned
APIs first. This forces you to think about how to make your change in a
compatible way. Rather than doing each step in every version, it's usually
easier to do each versioned API one at a time, or to do all of one version
before starting "all the rest".
### Edit types.go
The struct definitions for each API are in `pkg/api/<version>/types.go`. Edit
those files to reflect the change you want to make. Note that all non-online
fields in versioned APIs must have description tags - these are used to generate
documentation.
### Edit defaults.go
If your change includes new fields for which you will need default values, you
need to add cases to `pkg/api/<version>/defaults.go`. Of course, since you
have added code, you have to add a test: `pkg/api/<version>/defaults_test.go`.
Do use pointers to scalars when you need to distinguish between an unset value
and an an automatic zero value. For example,
`PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type
definition. A zero value means 0 seconds, and a nil value asks the system to
pick a default.
Don't forget to run the tests!
### Edit conversion.go
Given that you have not yet changed the internal structs, this might feel
premature, and that's because it is. You don't yet have anything to convert to
or from. We will revisit this in the "internal" section. If you're doing this
all in a different order (i.e. you started with the internal structs), then you
should jump to that topic below. In the very rare case that you are making an
incompatible change you might or might not want to do this now, but you will
have to do more later. The files you want are
`pkg/api/<version>/conversion.go` and `pkg/api/<version>/conversion_test.go`.
## Changing the internal structures
Now it is time to change the internal structs so your versioned changes can be
used.
### Edit types.go
Similar to the versioned APIs, the definitions for the internal structs are in
`pkg/api/types.go`. Edit those files to reflect the change you want to make.
Keep in mind that the internal structs must be able to express *all* of the
versioned APIs.
## Edit validation.go
Most changes made to the internal structs need some form of input validation.
Validation is currently done on internal objects in
`pkg/api/validation/validation.go`. This validation is the one of the first
opportunities we have to make a great user experience - good error messages and
thorough validation help ensure that users are giving you what you expect and,
when they don't, that they know why and how to fix it. Think hard about the
contents of `string` fields, the bounds of `int` fields and the
requiredness/optionalness of fields.
Of course, code needs tests - `pkg/api/validation/validation_test.go`.
## Edit version conversions
At this point you have both the versioned API changes and the internal
structure changes done. If there are any notable differences - field names,
types, structural change in particular - you must add some logic to convert
versioned APIs to and from the internal representation. If you see errors from
the `serialization_test`, it may indicate the need for explicit conversions.
Performance of conversions very heavily influence performance of apiserver.
Thus, we are auto-generating conversion functions that are much more efficient
than the generic ones (which are based on reflections and thus are highly
inefficient).
The conversion code resides with each versioned API. There are two files:
- `pkg/api/<version>/conversion.go` containing manually written conversion
functions
- `pkg/api/<version>/conversion_generated.go` containing auto-generated
conversion functions
Since auto-generated conversion functions are using manually written ones,
those manually written should be named with a defined convention, i.e. a function
converting type X in pkg a to type Y in pkg b, should be named:
`convert_a_X_To_b_Y`.
Also note that you can (and for efficiency reasons should) use auto-generated
conversion functions when writing your conversion functions.
Once all the necessary manually written conversions are added, you need to
regenerate auto-generated ones. To regenerate them:
- run
```
$ hack/update-generated-conversions.sh
```
If running the above script is impossible due to compile errors, the easiest
workaround is to comment out the code causing errors and let the script to
regenerate it. If the auto-generated conversion methods are not used by the
manually-written ones, it's fine to just remove the whole file and let the
generator to create it from scratch.
Unsurprisingly, adding manually written conversion also requires you to add tests to
`pkg/api/<version>/conversion_test.go`.
## Update the fuzzer
Part of our testing regimen for APIs is to "fuzz" (fill with random values) API
objects and then convert them to and from the different API versions. This is
a great way of exposing places where you lost information or made bad
assumptions. If you have added any fields which need very careful formatting
(the test does not run validation) or if you have made assumptions such as
"this slice will always have at least 1 element", you may get an error or even
a panic from the `serialization_test`. If so, look at the diff it produces (or
the backtrace in case of a panic) and figure out what you forgot. Encode that
into the fuzzer's custom fuzz functions. Hint: if you added defaults for a field,
that field will need to have a custom fuzz function that ensures that the field is
fuzzed to a non-empty value.
The fuzzer can be found in `pkg/api/testing/fuzzer.go`.
## Update the semantic comparisons
VERY VERY rarely is this needed, but when it hits, it hurts. In some rare
cases we end up with objects (e.g. resource quantities) that have morally
equivalent values with different bitwise representations (e.g. value 10 with a
base-2 formatter is the same as value 0 with a base-10 formatter). The only way
Go knows how to do deep-equality is through field-by-field bitwise comparisons.
This is a problem for us.
The first thing you should do is try not to do that. If you really can't avoid
this, I'd like to introduce you to our semantic DeepEqual routine. It supports
custom overrides for specific types - you can find that in `pkg/api/helpers.go`.
There's one other time when you might have to touch this: unexported fields.
You see, while Go's `reflect` package is allowed to touch unexported fields, us
mere mortals are not - this includes semantic DeepEqual. Fortunately, most of
our API objects are "dumb structs" all the way down - all fields are exported
(start with a capital letter) and there are no unexported fields. But sometimes
you want to include an object in our API that does have unexported fields
somewhere in it (for example, `time.Time` has unexported fields). If this hits
you, you may have to touch the semantic DeepEqual customization functions.
## Implement your change
Now you have the API all changed - go implement whatever it is that you're
doing!
## Write end-to-end tests
This is, sadly, still sort of painful. Talk to us and we'll try to help you
figure out the best way to make sure your cool feature keeps working forever.
## Examples and docs
At last, your change is done, all unit tests pass, e2e passes, you're done,
right? Actually, no. You just changed the API. If you are touching an
existing facet of the API, you have to try *really* hard to make sure that
*all* the examples and docs are updated. There's no easy way to do this, due
in part to JSON and YAML silently dropping unknown fields. You're clever -
you'll figure it out. Put `grep` or `ack` to good use.
If you added functionality, you should consider documenting it and/or writing
an example to illustrate your change.
Make sure you update the swagger API spec by running:
```shell
$ hack/update-swagger-spec.sh
```
The API spec changes should be in a commit separate from your other changes.
## Incompatible API changes
If your change is going to be backward incompatible or might be a breaking change for API
consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before
the change gets in. If you are unsure, ask. Also make sure that the change gets documented in
`CHANGELOG.md` for the next release.
## Adding new REST objects
TODO(smarterclayton): write this.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api_changes.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/api_changes.md?pixel)]()

View File

@ -0,0 +1,13 @@
Coding style advice for contributors
- Bash
- https://google-styleguide.googlecode.com/svn/trunk/shell.xml
- Go
- https://github.com/golang/go/wiki/CodeReviewComments
- https://gist.github.com/lavalamp/4bd23295a9f32706a48f
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/coding-conventions.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/coding-conventions.md?pixel)]()

View File

@ -0,0 +1,46 @@
# On Collaborative Development
Kubernetes is open source, but many of the people working on it do so as their day job. In order to avoid forcing people to be "at work" effectively 24/7, we want to establish some semi-formal protocols around development. Hopefully these rules make things go more smoothly. If you find that this is not the case, please complain loudly.
## Patches welcome
First and foremost: as a potential contributor, your changes and ideas are welcome at any hour of the day or night, weekdays, weekends, and holidays. Please do not ever hesitate to ask a question or send a PR.
## Code reviews
All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably (for non-trivial changes obligately) from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should not be committed until relevant parties (e.g. owners of the subsystem affected by the PR) have had a reasonable chance to look at PR in their local business hours.
Most PRs will find reviewers organically. If a maintainer intends to be the primary reviewer of a PR they should set themselves as the assignee on GitHub and say so in a reply to the PR. Only the primary reviewer of a change should actually do the merge, except in rare cases (e.g. they are unavailable in a reasonable timeframe).
If a PR has gone 2 work days without an owner emerging, please poke the PR thread and ask for a reviewer to be assigned.
Except for rare cases, such as trivial changes (e.g. typos, comments) or emergencies (e.g. broken builds), maintainers should not merge their own changes.
Expect reviewers to request that you avoid [common go style mistakes](https://github.com/golang/go/wiki/CodeReviewComments) in your PRs.
## Assigned reviews
Maintainers can assign reviews to other maintainers, when appropriate. The assignee becomes the shepherd for that PR and is responsible for merging the PR once they are satisfied with it or else closing it. The assignee might request reviews from non-maintainers.
## Merge hours
Maintainers will do merges of appropriately reviewed-and-approved changes during their local "business hours" (typically 7:00 am Monday to 5:00 pm (17:00h) Friday). PRs that arrive over the weekend or on holidays will only be merged if there is a very good reason for it and if the code review requirements have been met. Concretely this means that nobody should merge changes immediately before going to bed for the night.
There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred.
If a PR is considered complex or controversial, the merge of that PR should be delayed to give all interested parties in all timezones the opportunity to provide feedback. Concretely, this means that such PRs should be held for 24
hours before merging. Of course "complex" and "controversial" are left to the judgment of the people involved, but we trust that part of being a committer is the judgment required to evaluate such things honestly, and not be
motivated by your desire (or your cube-mate's desire) to get their code merged. Also see "Holds" below, any reviewer can issue a "hold" to indicate that the PR is in fact complicated or complex and deserves further review.
PRs that are incorrectly judged to be merge-able, may be reverted and subject to re-review, if subsequent reviewers believe that they in fact are controversial or complex.
## Holds
Any maintainer or core contributor who wants to review a PR but does not have time immediately may put a hold on a PR simply by saying so on the PR discussion and offering an ETA measured in single-digit days at most. Any PR that has a hold shall not be merged until the person who requested the hold acks the review, withdraws their hold, or is overruled by a preponderance of maintainers.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/collab.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/collab.md?pixel)]()

View File

@ -0,0 +1,341 @@
## Getting started with Vagrant
Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/develop on your local machine (Linux, Mac OS X).
### Prerequisites
1. Install latest version >= 1.6.2 of vagrant from http://www.vagrantup.com/downloads.html
2. Install one of:
1. The latest version of Virtual Box from https://www.virtualbox.org/wiki/Downloads
2. [VMWare Fusion](https://www.vmware.com/products/fusion/) version 5 or greater as well as the appropriate [Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware)
3. [VMWare Workstation](https://www.vmware.com/products/workstation/) version 9 or greater as well as the [Vagrant VMWare Workstation provider](https://www.vagrantup.com/vmware)
4. [Parallels Desktop](https://www.parallels.com/products/desktop/) version 9 or greater as well as the [Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/)
3. Get or build a [binary release](/docs/getting-started-guides/binary_release.md)
### Setup
By default, the Vagrant setup will create a single kubernetes-master and 1 kubernetes-minion. Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run:
```sh
cd kubernetes
export KUBERNETES_PROVIDER=vagrant
./cluster/kube-up.sh
```
The `KUBERNETES_PROVIDER` environment variable tells all of the various cluster management scripts which variant to use. If you forget to set this, the assumption is you are running on Google Compute Engine.
If you installed more than one Vagrant provider, Kubernetes will usually pick the appropriate one. However, you can override which one Kubernetes will use by setting the [`VAGRANT_DEFAULT_PROVIDER`](https://docs.vagrantup.com/v2/providers/default.html) environment variable:
```sh
export VAGRANT_DEFAULT_PROVIDER=parallels
export KUBERNETES_PROVIDER=vagrant
./cluster/kube-up.sh
```
Vagrant will provision each machine in the cluster with all the necessary components to run Kubernetes. The initial setup can take a few minutes to complete on each machine.
By default, each VM in the cluster is running Fedora, and all of the Kubernetes services are installed into systemd.
To access the master or any minion:
```sh
vagrant ssh master
vagrant ssh minion-1
```
If you are running more than one minion, you can access the others by:
```sh
vagrant ssh minion-2
vagrant ssh minion-3
```
To view the service status and/or logs on the kubernetes-master:
```sh
vagrant ssh master
[vagrant@kubernetes-master ~] $ sudo systemctl status kube-apiserver
[vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-apiserver
[vagrant@kubernetes-master ~] $ sudo systemctl status kube-controller-manager
[vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-controller-manager
[vagrant@kubernetes-master ~] $ sudo systemctl status etcd
[vagrant@kubernetes-master ~] $ sudo systemctl status nginx
```
To view the services on any of the kubernetes-minion(s):
```sh
vagrant ssh minion-1
[vagrant@kubernetes-minion-1] $ sudo systemctl status docker
[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u docker
[vagrant@kubernetes-minion-1] $ sudo systemctl status kubelet
[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u kubelet
```
### Interacting with your Kubernetes cluster with Vagrant.
With your Kubernetes cluster up, you can manage the nodes in your cluster with the regular Vagrant commands.
To push updates to new Kubernetes code after making source changes:
```sh
./cluster/kube-push.sh
```
To stop and then restart the cluster:
```sh
vagrant halt
./cluster/kube-up.sh
```
To destroy the cluster:
```sh
vagrant destroy
```
Once your Vagrant machines are up and provisioned, the first thing to do is to check that you can use the `kubectl.sh` script.
You may need to build the binaries first, you can do this with ```make```
```sh
$ ./cluster/kubectl.sh get minions
NAME LABELS
10.245.1.4 <none>
10.245.1.5 <none>
10.245.1.3 <none>
```
### Interacting with your Kubernetes cluster with the `kube-*` scripts.
Alternatively to using the vagrant commands, you can also use the `cluster/kube-*.sh` scripts to interact with the vagrant based provider just like any other hosting platform for kubernetes.
All of these commands assume you have set `KUBERNETES_PROVIDER` appropriately:
```sh
export KUBERNETES_PROVIDER=vagrant
```
Bring up a vagrant cluster
```sh
./cluster/kube-up.sh
```
Destroy the vagrant cluster
```sh
./cluster/kube-down.sh
```
Update the vagrant cluster after you make changes (only works when building your own releases locally):
```sh
./cluster/kube-push.sh
```
Interact with the cluster
```sh
./cluster/kubectl.sh
```
### Authenticating with your master
When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will not be prompted for them in the future.
```sh
cat ~/.kubernetes_vagrant_auth
{ "User": "vagrant",
"Password": "vagrant"
"CAFile": "/home/k8s_user/.kubernetes.vagrant.ca.crt",
"CertFile": "/home/k8s_user/.kubecfg.vagrant.crt",
"KeyFile": "/home/k8s_user/.kubecfg.vagrant.key"
}
```
You should now be set to use the `cluster/kubectl.sh` script. For example try to list the minions that you have started with:
```sh
./cluster/kubectl.sh get minions
```
### Running containers
Your cluster is running, you can list the minions in your cluster:
```sh
$ ./cluster/kubectl.sh get minions
NAME LABELS
10.245.2.4 <none>
10.245.2.3 <none>
10.245.2.2 <none>
```
Now start running some containers!
You can now use any of the cluster/kube-*.sh commands to interact with your VM machines.
Before starting a container there will be no pods, services and replication controllers.
```
$ cluster/kubectl.sh get pods
NAME IMAGE(S) HOST LABELS STATUS
$ cluster/kubectl.sh get services
NAME LABELS SELECTOR IP PORT
$ cluster/kubectl.sh get replicationcontrollers
NAME IMAGE(S SELECTOR REPLICAS
```
Start a container running nginx with a replication controller and three replicas
```
$ cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80
```
When listing the pods, you will see that three containers have been started and are in Waiting state:
```
$ cluster/kubectl.sh get pods
NAME IMAGE(S) HOST LABELS STATUS
781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Waiting
7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Waiting
78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Waiting
```
You need to wait for the provisioning to complete, you can monitor the minions by doing:
```sh
$ sudo salt '*minion-1' cmd.run 'docker images'
kubernetes-minion-1:
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 96864a7d2df3 26 hours ago 204.4 MB
kubernetes/pause latest 6c4579af347b 8 weeks ago 239.8 kB
```
Once the docker image for nginx has been downloaded, the container will start and you can list it:
```sh
$ sudo salt '*minion-1' cmd.run 'docker ps'
kubernetes-minion-1:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dbe79bf6e25b nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f
fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b
```
Going back to listing the pods, services and replicationcontrollers, you now have:
```
$ cluster/kubectl.sh get pods
NAME IMAGE(S) HOST LABELS STATUS
781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Running
7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running
78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running
$ cluster/kubectl.sh get services
NAME LABELS SELECTOR IP PORT
$ cluster/kubectl.sh get replicationcontrollers
NAME IMAGE(S SELECTOR REPLICAS
myNginx nginx name=my-nginx 3
```
We did not start any services, hence there are none listed. But we see three replicas displayed properly.
Check the [guestbook](/examples/guestbook/README.md) application to learn how to create a service.
You can already play with scaling the replicas with:
```sh
$ ./cluster/kubectl.sh scale rc my-nginx --replicas=2
$ ./cluster/kubectl.sh get pods
NAME IMAGE(S) HOST LABELS STATUS
7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running
78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running
```
Congratulations!
### Testing
The following will run all of the end-to-end testing scenarios assuming you set your environment in `cluster/kube-env.sh`:
```sh
NUM_MINIONS=3 hack/e2e-test.sh
```
### Troubleshooting
#### I keep downloading the same (large) box all the time!
By default the Vagrantfile will download the box from S3. You can change this (and cache the box locally) by providing a name and an alternate URL when calling `kube-up.sh`
```sh
export KUBERNETES_BOX_NAME=choose_your_own_name_for_your_kuber_box
export KUBERNETES_BOX_URL=path_of_your_kuber_box
export KUBERNETES_PROVIDER=vagrant
./cluster/kube-up.sh
```
#### I just created the cluster, but I am getting authorization errors!
You probably have an incorrect ~/.kubernetes_vagrant_auth file for the cluster you are attempting to contact.
```sh
rm ~/.kubernetes_vagrant_auth
```
After using kubectl.sh make sure that the correct credentials are set:
```sh
cat ~/.kubernetes_vagrant_auth
{
"User": "vagrant",
"Password": "vagrant"
}
```
#### I just created the cluster, but I do not see my container running!
If this is your first time creating the cluster, the kubelet on each minion schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned.
#### I changed Kubernetes code, but it's not running!
Are you sure there was no build error? After running `$ vagrant provision`, scroll up and ensure that each Salt state was completed successfully on each box in the cluster.
It's very likely you see a build error due to an error in your source files!
#### I have brought Vagrant up but the minions won't validate!
Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the minions (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`).
#### I want to change the number of minions!
You can control the number of minions that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough minions to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single minion. You do this, by setting `NUM_MINIONS` to 1 like so:
```sh
export NUM_MINIONS=1
```
#### I want my VMs to have more memory!
You can control the memory allotted to virtual machines with the `KUBERNETES_MEMORY` environment variable.
Just set it to the number of megabytes you would like the machines to have. For example:
```sh
export KUBERNETES_MEMORY=2048
```
If you need more granular control, you can set the amount of memory for the master and minions independently. For example:
```sh
export KUBERNETES_MASTER_MEMORY=1536
export KUBERNETES_MINION_MEMORY=2048
```
#### I ran vagrant suspend and nothing works!
```vagrant suspend``` seems to mess up the network. It's not supported at this time.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guides/vagrant.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/developer-guides/vagrant.md?pixel)]()

View File

@ -0,0 +1,275 @@
# Development Guide
# Releases and Official Builds
Official releases are built in Docker containers. Details are [here](../../build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below.
## Go development environment
Kubernetes is written in [Go](http://golang.org) programming language. If you haven't set up Go development environment, please follow [this instruction](http://golang.org/doc/code.html) to install go tool and set up GOPATH. Ensure your version of Go is at least 1.3.
## Clone kubernetes into GOPATH
We highly recommend to put kubernetes' code into your GOPATH. For example, the following commands will download kubernetes' code under the current user's GOPATH (Assuming there's only one directory in GOPATH.):
```
$ echo $GOPATH
/home/user/goproj
$ mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/
$ cd $GOPATH/src/github.com/GoogleCloudPlatform/
$ git clone https://github.com/GoogleCloudPlatform/kubernetes.git
```
The commands above will not work if there are more than one directory in ``$GOPATH``.
If you plan to do development, read about the
[Kubernetes Github Flow](https://docs.google.com/presentation/d/1HVxKSnvlc2WJJq8b9KCYtact5ZRrzDzkWgKEfm0QO_o/pub?start=false&loop=false&delayms=3000),
and then clone your own fork of Kubernetes as described there.
## godep and dependency management
Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. It is not strictly required for building Kubernetes but it is required when managing dependencies under the Godeps/ tree, and is required by a number of the build and test scripts. Please make sure that ``godep`` is installed and in your ``$PATH``.
### Installing godep
There are many ways to build and host go binaries. Here is an easy way to get utilities like ```godep``` installed:
1) Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial
source control system). Use ```apt-get install mercurial``` or ```yum install mercurial``` on Linux, or [brew.sh](http://brew.sh) on OS X, or download
directly from mercurial.
2) Create a new GOPATH for your tools and install godep:
```
export GOPATH=$HOME/go-tools
mkdir -p $GOPATH
go get github.com/tools/godep
```
3) Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile:
```
export GOPATH=$HOME/go-tools
export PATH=$PATH:$GOPATH/bin
```
### Using godep
Here's a quick walkthrough of one way to use godeps to add or update a Kubernetes dependency into Godeps/_workspace. For more details, please see the instructions in [godep's documentation](https://github.com/tools/godep).
1) Devote a directory to this endeavor:
```
export KPATH=$HOME/code/kubernetes
mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes
cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes
git clone https://path/to/your/fork .
# Or copy your existing local repo here. IMPORTANT: making a symlink doesn't work.
```
2) Set up your GOPATH.
```
# Option A: this will let your builds see packages that exist elsewhere on your system.
export GOPATH=$KPATH:$GOPATH
# Option B: This will *not* let your local builds see packages that exist elsewhere on your system.
export GOPATH=$KPATH
# Option B is recommended if you're going to mess with the dependencies.
```
3) Populate your new GOPATH.
```
cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes
godep restore
```
4) Next, you can either add a new dependency or update an existing one.
```
# To add a new dependency, do:
cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes
go get path/to/dependency
# Change code in Kubernetes to use the dependency.
godep save ./...
# To update an existing dependency, do:
cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes
go get -u path/to/dependency
# Change code in Kubernetes accordingly if necessary.
godep update path/to/dependency
```
5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: ```godep restore```
It is sometimes expedient to manually fix the /Godeps/godeps.json file to minimize the changes.
Please send dependency updates in separate commits within your PR, for easier reviewing.
## Hooks
Before committing any changes, please link/copy these hooks into your .git
directory. This will keep you from accidentally committing non-gofmt'd go code.
```
cd kubernetes/.git/hooks/
ln -s ../../hooks/pre-commit .
```
## Unit tests
```
cd kubernetes
hack/test-go.sh
```
Alternatively, you could also run:
```
cd kubernetes
godep go test ./...
```
If you only want to run unit tests in one package, you could run ``godep go test`` under the package directory. For example, the following commands will run all unit tests in package kubelet:
```
$ cd kubernetes # step into kubernetes' directory.
$ cd pkg/kubelet
$ godep go test
# some output from unit tests
PASS
ok github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet 0.317s
```
## Coverage
Currently, collecting coverage is only supported for the Go unit tests.
To run all unit tests and generate an HTML coverage report, run the following:
```
cd kubernetes
KUBE_COVER=y hack/test-go.sh
```
At the end of the run, an the HTML report will be generated with the path printed to stdout.
To run tests and collect coverage in only one package, pass its relative path under the `kubernetes` directory as an argument, for example:
```
cd kubernetes
KUBE_COVER=y hack/test-go.sh pkg/kubectl
```
Multiple arguments can be passed, in which case the coverage results will be combined for all tests run.
Coverage results for the project can also be viewed on [Coveralls](https://coveralls.io/r/GoogleCloudPlatform/kubernetes), and are continuously updated as commits are merged. Additionally, all pull requests which spawn a Travis build will report unit test coverage results to Coveralls.
## Integration tests
You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``.
```
cd kubernetes
hack/test-integration.sh
```
## End-to-End tests
You can run an end-to-end test which will bring up a master and two minions, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce".
```
cd kubernetes
hack/e2e-test.sh
```
Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with this command:
```
go run hack/e2e.go --down
```
### Flag options
See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster, here is an overview:
```sh
# Build binaries for testing
go run hack/e2e.go --build
# Create a fresh cluster. Deletes a cluster first, if it exists
go run hack/e2e.go --up
# Create a fresh cluster at a specific release version.
go run hack/e2e.go --up --version=0.7.0
# Test if a cluster is up.
go run hack/e2e.go --isup
# Push code to an existing cluster
go run hack/e2e.go --push
# Push to an existing cluster, or bring up a cluster if it's down.
go run hack/e2e.go --pushup
# Run all tests
go run hack/e2e.go --test
# Run tests matching the regex "Pods.*env"
go run hack/e2e.go -v -test --test_args="--ginkgo.focus=Pods.*env"
# Alternately, if you have the e2e cluster up and no desire to see the event stream, you can run ginkgo-e2e.sh directly:
hack/ginkgo-e2e.sh --ginkgo.focus=Pods.*env
```
### Combining flags
```sh
# Flags can be combined, and their actions will take place in this order:
# -build, -push|-up|-pushup, -test|-tests=..., -down
# e.g.:
go run hack/e2e.go -build -pushup -test -down
# -v (verbose) can be added if you want streaming output instead of only
# seeing the output of failed commands.
# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for
# cleaning up after a failed test or viewing logs. Use -v to avoid suppressing
# kubectl output.
go run hack/e2e.go -v -ctl='get events'
go run hack/e2e.go -v -ctl='delete pod foobar'
```
## Conformance testing
End-to-end testing, as described above, is for [development
distributions](../../docs/devel/writing-a-getting-started-guide.md). A conformance test is used on
a [versioned distro](../../docs/devel/writing-a-getting-started-guide.md).
The conformance test runs a subset of the e2e-tests against a manually-created cluster. It does not
require support for up/push/down and other operations. To run a conformance test, you need to know the
IP of the master for your cluster and the authorization arguments to use. The conformance test is
intended to run against a cluster at a specific binary release of Kubernetes.
See [conformance-test.sh](../../hack/conformance-test.sh).
## Testing out flaky tests
[Instructions here](flaky-tests.md)
## Keeping your development fork in sync
One time after cloning your forked repo:
```
git remote add upstream https://github.com/GoogleCloudPlatform/kubernetes.git
```
Then each time you want to sync to upstream:
```
git fetch upstream
git rebase upstream/master
```
If you have write access to the main repository, you should modify your git configuration so that
you can't accidentally push to upstream:
```
git remote set-url --push upstream no_push
```
## Regenerating the CLI documentation
```
hack/run-gendocs.sh
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/development.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/development.md?pixel)]()

View File

@ -0,0 +1,183 @@
# How to get faster PR reviews
Most of what is written here is not at all specific to Kubernetes, but it bears
being written down in the hope that it will occasionally remind people of "best
practices" around code reviews.
You've just had a brilliant idea on how to make Kubernetes better. Let's call
that idea "FeatureX". Feature X is not even that complicated. You have a
pretty good idea of how to implement it. You jump in and implement it, fixing a
bunch of stuff along the way. You send your PR - this is awesome! And it sits.
And sits. A week goes by and nobody reviews it. Finally someone offers a few
comments, which you fix up and wait for more review. And you wait. Another
week or two goes by. This is horrible.
What went wrong? One particular problem that comes up frequently is this - your
PR is too big to review. You've touched 39 files and have 8657 insertions.
When your would-be reviewers pull up the diffs they run away - this PR is going
to take 4 hours to review and they don't have 4 hours right now. They'll get to it
later, just as soon as they have more free time (ha!).
Let's talk about how to avoid this.
## 1. Don't build a cathedral in one PR
Are you sure FeatureX is something the Kubernetes team wants or will accept, or
that it is implemented to fit with other changes in flight? Are you willing to
bet a few days or weeks of work on it? If you have any doubt at all about the
usefulness of your feature or the design - make a proposal doc or a sketch PR
or both. Write or code up just enough to express the idea and the design and
why you made those choices, then get feedback on this. Now, when we ask you to
change a bunch of facets of the design, you don't have to re-write it all.
## 2. Smaller diffs are exponentially better
Small PRs get reviewed faster and are more likely to be correct than big ones.
Let's face it - attention wanes over time. If your PR takes 60 minutes to
review, I almost guarantee that the reviewer's eye for details is not as keen in
the last 30 minutes as it was in the first. This leads to multiple rounds of
review when one might have sufficed. In some cases the review is delayed in its
entirety by the need for a large contiguous block of time to sit and read your
code.
Whenever possible, break up your PRs into multiple commits. Making a series of
discrete commits is a powerful way to express the evolution of an idea or the
different ideas that make up a single feature. There's a balance to be struck,
obviously. If your commits are too small they become more cumbersome to deal
with. Strive to group logically distinct ideas into commits.
For example, if you found that FeatureX needed some "prefactoring" to fit in,
make a commit that JUST does that prefactoring. Then make a new commit for
FeatureX. Don't lump unrelated things together just because you didn't think
about prefactoring. If you need to, fork a new branch, do the prefactoring
there and send a PR for that. If you can explain why you are doing seemingly
no-op work ("it makes the FeatureX change easier, I promise") we'll probably be
OK with it.
Obviously, a PR with 25 commits is still very cumbersome to review, so use
common sense.
## 3. Multiple small PRs are often better than multiple commits
If you can extract whole ideas from your PR and send those as PRs of their own,
you can avoid the painful problem of continually rebasing. Kubernetes is a
fast-moving codebase - lock in your changes ASAP, and make merges be someone
else's problem.
Obviously, we want every PR to be useful on its own, so you'll have to use
common sense in deciding what can be a PR vs what should be a commit in a larger
PR. Rule of thumb - if this commit or set of commits is directly related to
FeatureX and nothing else, it should probably be part of the FeatureX PR. If
you can plausibly imagine someone finding value in this commit outside of
FeatureX, try it as a PR.
Don't worry about flooding us with PRs. We'd rather have 100 small, obvious PRs
than 10 unreviewable monoliths.
## 4. Don't rename, reformat, comment, etc in the same PR
Often, as you are implementing FeatureX, you find things that are just wrong.
Bad comments, poorly named functions, bad structure, weak type-safety. You
should absolutely fix those things (or at least file issues, please) - but not
in this PR. See the above points - break unrelated changes out into different
PRs or commits. Otherwise your diff will have WAY too many changes, and your
reviewer won't see the forest because of all the trees.
## 5. Comments matter
Read up on GoDoc - follow those general rules. If you're writing code and you
think there is any possible chance that someone might not understand why you did
something (or that you won't remember what you yourself did), comment it. If
you think there's something pretty obvious that we could follow up on, add a
TODO. Many code-review comments are about this exact issue.
## 5. Tests are almost always required
Nothing is more frustrating than doing a review, only to find that the tests are
inadequate or even entirely absent. Very few PRs can touch code and NOT touch
tests. If you don't know how to test FeatureX - ask! We'll be happy to help
you design things for easy testing or to suggest appropriate test cases.
## 6. Look for opportunities to generify
If you find yourself writing something that touches a lot of modules, think hard
about the dependencies you are introducing between packages. Can some of what
you're doing be made more generic and moved up and out of the FeatureX package?
Do you need to use a function or type from an otherwise unrelated package? If
so, promote! We have places specifically for hosting more generic code.
Likewise if FeatureX is similar in form to FeatureW which was checked in last
month and it happens to exactly duplicate some tricky stuff from FeatureW,
consider prefactoring core logic out and using it in both FeatureW and FeatureX.
But do that in a different commit or PR, please.
## 7. Fix feedback in a new commit
Your reviewer has finally sent you some feedback on FeatureX. You make a bunch
of changes and ... what? You could patch those into your commits with git
"squash" or "fixup" logic. But that makes your changes hard to verify. Unless
your whole PR is pretty trivial, you should instead put your fixups into a new
commit and re-push. Your reviewer can then look at that commit on its own - so
much faster to review than starting over.
We might still ask you to clean up your commits at the very end, for the sake
of a more readable history.
## 8. KISS, YAGNI, MVP, etc
Sometimes we need to remind each other of core tenets of software design - Keep
It Simple, You Aren't Gonna Need It, Minimum Viable Product, and so on. Adding
features "because we might need it later" is antithetical to software that
ships. Add the things you need NOW and (ideally) leave room for things you
might need later - but don't implement them now.
## 9. Push back
We understand that it is hard to imagine, but sometimes we make mistakes. It's
OK to push back on changes requested during a review. If you have a good reason
for doing something a certain way, you are absolutely allowed to debate the
merits of a requested change. You might be overruled, but you might also
prevail. We're mostly pretty reasonable people. Mostly.
## 10. I'm still getting stalled - help?!
So, you've done all that and you still aren't getting any PR love? Here's some
things you can do that might help kick a stalled process along:
* Make sure that your PR has an assigned reviewer (assignee in GitHub). If
this is not the case, reply to the PR comment stream asking for one to be
assigned.
* Ping the assignee (@username) on the PR comment stream asking for an
estimate of when they can get to it.
* Ping the assignee by email (many of us have email addresses that are well
published or are the same as our GitHub handle @google.com or @redhat.com).
If you think you have fixed all the issues in a round of review, and you haven't
heard back, you should ping the reviewer (assignee) on the comment stream with a
"please take another look" (PTAL) or similar comment indicating you are done and
you think it is ready for re-review. In fact, this is probably a good habit for
all PRs.
One phenomenon of open-source projects (where anyone can comment on any issue)
is the dog-pile - your PR gets so many comments from so many people it becomes
hard to follow. In this situation you can ask the primary reviewer
(assignee) whether they want you to fork a new PR to clear out all the comments.
Remember: you don't HAVE to fix every issue raised by every person who feels
like commenting, but you should at least answer reasonable comments with an
explanation.
## Final: Use common sense
Obviously, none of these points are hard rules. There is no document that can
take the place of common sense and good taste. Use your best judgment, but put
a bit of thought into how your work can be made easier to review. If you do
these things your PRs will flow much more easily.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/faster_reviews.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/faster_reviews.md?pixel)]()

View File

@ -0,0 +1,68 @@
# Hunting flaky tests in Kubernetes
Sometimes unit tests are flaky. This means that due to (usually) race conditions, they will occasionally fail, even though most of the time they pass.
We have a goal of 99.9% flake free tests. This means that there is only one flake in one thousand runs of a test.
Running a test 1000 times on your own machine can be tedious and time consuming. Fortunately, there is a better way to achieve this using Kubernetes.
_Note: these instructions are mildly hacky for now, as we get run once semantics and logging they will get better_
There is a testing image ```brendanburns/flake``` up on the docker hub. We will use this image to test our fix.
Create a replication controller with the following config:
```yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: flakecontroller
spec:
replicas: 24
template:
metadata:
labels:
name: flake
spec:
containers:
- name: flake
image: brendanburns/flake
env:
- name: TEST_PACKAGE
value: pkg/tools
- name: REPO_SPEC
value: https://github.com/GoogleCloudPlatform/kubernetes
```
Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default.
```
kubectl create -f controller.yaml
```
This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test.
You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently.
You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes:
```sh
echo "" > output.txt
for i in {1..4}; do
echo "Checking kubernetes-minion-${i}"
echo "kubernetes-minion-${i}:" >> output.txt
gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt
done
grep "Exited ([^0])" output.txt
```
Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running:
```sh
kubectl stop replicationcontroller flakecontroller
```
If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller.
Happy flake hunting!
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/flaky-tests.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/flaky-tests.md?pixel)]()

View File

@ -0,0 +1,25 @@
GitHub Issues for the Kubernetes Project
========================================
A list quick overview of how we will review and prioritize incoming issues at https://github.com/GoogleCloudPlatform/kubernetes/issues
Priorities
----------
We will use GitHub issue labels for prioritization. The absence of a priority label means the bug has not been reviewed and prioritized yet.
Definitions
-----------
* P0 - something broken for users, build broken, or critical security issue. Someone must drop everything and work on it.
* P1 - must fix for earliest possible binary release (every two weeks)
* P2 - should be fixed in next major release version
* P3 - default priority for lower importance bugs that we still want to track and plan to fix at some point
* design - priority/design is for issues that are used to track design discussions
* support - priority/support is used for issues tracking user support requests
* untriaged - anything without a priority/X label will be considered untriaged
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/issues.md?pixel)]()

View File

@ -0,0 +1,32 @@
Logging Conventions
===================
The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally preferred to [log](http://golang.org/pkg/log/) for better runtime control.
* glog.Errorf() - Always an error
* glog.Warningf() - Something unexpected, but probably not an error
* glog.Infof() has multiple levels:
* glog.V(0) - Generally useful for this to ALWAYS be visible to an operator
* Programmer errors
* Logging extra info about a panic
* CLI argument handling
* glog.V(1) - A reasonable default log level if you don't want verbosity.
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy)
* glog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
* Logging HTTP requests and their exit code
* System state changing (killing pod)
* Controller state change events (starting pods)
* Scheduler log messages
* glog.V(3) - Extended information about changes
* More info about system state changes
* glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where you may want to come back later and check it
As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/logging.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/logging.md?pixel)]()

View File

@ -0,0 +1,40 @@
# Profiling Kubernetes
This document explain how to plug in profiler and how to profile Kubernetes services.
## Profiling library
Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formatted profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result.
## Adding profiling to services to APIserver.
TL;DR: Add lines:
```
m.mux.HandleFunc("/debug/pprof/", pprof.Index)
m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
```
to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/pprof' package.
In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/master/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do.
## Connecting to the profiler
Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running:
```
ssh kubernetes_master -L<local_port>:localhost:8080
```
or analogous one for you Cloud provider. Afterwards you can e.g. run
```
go tool pprof http://localhost:<local_port>/debug/pprof/profile
```
to get 30 sec. CPU profile.
## Contention profiling
To enable contention profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/profiling.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/profiling.md?pixel)]()

View File

@ -0,0 +1,22 @@
Pull Request Process
====================
An overview of how we will manage old or out-of-date pull requests.
Process
-------
We will close any pull requests older than two weeks.
Exceptions can be made for PRs that have active review comments, or that are awaiting other dependent PRs. Closed pull requests are easy to recreate, and little work is lost by closing a pull request that subsequently needs to be reopened.
We want to limit the total number of PRs in flight to:
* Maintain a clean project
* Remove old PRs that would be difficult to rebase as the underlying code has changed over time
* Encourage code velocity
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/pull-requests.md?pixel)]()

View File

@ -0,0 +1,113 @@
// Build it with:
// $ dot -Tsvg releasing.dot >releasing.svg
digraph tagged_release {
size = "5,5"
// Arrows go up.
rankdir = BT
subgraph left {
// Group the left nodes together.
ci012abc -> pr101 -> ci345cde -> pr102
style = invis
}
subgraph right {
// Group the right nodes together.
version_commit -> dev_commit
style = invis
}
{ // Align the version commit and the info about it.
rank = same
// Align them with pr101
pr101
version_commit
// release_info shows the change in the commit.
release_info
}
{ // Align the dev commit and the info about it.
rank = same
// Align them with 345cde
ci345cde
dev_commit
dev_info
}
// Join the nodes from subgraph left.
pr99 -> ci012abc
pr102 -> pr100
// Do the version node.
pr99 -> version_commit
dev_commit -> pr100
tag -> version_commit
pr99 [
label = "Merge PR #99"
shape = box
fillcolor = "#ccccff"
style = "filled"
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
ci012abc [
label = "012abc"
shape = circle
fillcolor = "#ffffcc"
style = "filled"
fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace"
];
pr101 [
label = "Merge PR #101"
shape = box
fillcolor = "#ccccff"
style = "filled"
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
ci345cde [
label = "345cde"
shape = circle
fillcolor = "#ffffcc"
style = "filled"
fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace"
];
pr102 [
label = "Merge PR #102"
shape = box
fillcolor = "#ccccff"
style = "filled"
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
version_commit [
label = "678fed"
shape = circle
fillcolor = "#ccffcc"
style = "filled"
fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace"
];
dev_commit [
label = "456dcb"
shape = circle
fillcolor = "#ffffcc"
style = "filled"
fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace"
];
pr100 [
label = "Merge PR #100"
shape = box
fillcolor = "#ccccff"
style = "filled"
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
release_info [
label = "pkg/version/base.go:\ngitVersion = \"v0.5\";"
shape = none
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
dev_info [
label = "pkg/version/base.go:\ngitVersion = \"v0.5-dev\";"
shape = none
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
tag [
label = "$ git tag -a v0.5"
fillcolor = "#ffcccc"
style = "filled"
fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif"
];
}

View File

@ -0,0 +1,171 @@
# Releasing Kubernetes
This document explains how to create a Kubernetes release (as in version) and
how the version information gets embedded into the built binaries.
## Origin of the Sources
Kubernetes may be built from either a git tree (using `hack/build-go.sh`) or
from a tarball (using either `hack/build-go.sh` or `go install`) or directly by
the Go native build system (using `go get`).
When building from git, we want to be able to insert specific information about
the build tree at build time. In particular, we want to use the output of `git
describe` to generate the version of Kubernetes and the status of the build
tree (add a `-dirty` prefix if the tree was modified.)
When building from a tarball or using the Go build system, we will not have
access to the information about the git tree, but we still want to be able to
tell whether this build corresponds to an exact release (e.g. v0.3) or is
between releases (e.g. at some point in development between v0.3 and v0.4).
## Version Number Format
In order to account for these use cases, there are some specific formats that
may end up representing the Kubernetes version. Here are a few examples:
- **v0.5**: This is official version 0.5 and this version will only be used
when building from a clean git tree at the v0.5 git tag, or from a tree
extracted from the tarball corresponding to that specific release.
- **v0.5-15-g0123abcd4567**: This is the `git describe` output and it indicates
that we are 15 commits past the v0.5 release and that the SHA1 of the commit
where the binaries were built was `0123abcd4567`. It is only possible to have
this level of detail in the version information when building from git, not
when building from a tarball.
- **v0.5-15-g0123abcd4567-dirty** or **v0.5-dirty**: The extra `-dirty` prefix
means that the tree had local modifications or untracked files at the time of
the build, so there's no guarantee that the source code matches exactly the
state of the tree at the `0123abcd4567` commit or at the `v0.5` git tag
(resp.)
- **v0.5-dev**: This means we are building from a tarball or using `go get` or,
if we have a git tree, we are using `go install` directly, so it is not
possible to inject the git version into the build information. Additionally,
this is not an official release, so the `-dev` prefix indicates that the
version we are building is after `v0.5` but before `v0.6`. (There is actually
an exception where a commit with `v0.5-dev` is not present on `v0.6`, see
later for details.)
## Injecting Version into Binaries
In order to cover the different build cases, we start by providing information
that can be used when using only Go build tools or when we do not have the git
version information available.
To be able to provide a meaningful version in those cases, we set the contents
of variables in a Go source file that will be used when no overrides are
present.
We are using `pkg/version/base.go` as the source of versioning in absence of
information from git. Here is a sample of that file's contents:
```
var (
gitVersion string = "v0.4-dev" // version from git, output of $(git describe)
gitCommit string = "" // sha1 from git, output of $(git rev-parse HEAD)
)
```
This means a build with `go install` or `go get` or a build from a tarball will
yield binaries that will identify themselves as `v0.4-dev` and will not be able
to provide you with a SHA1.
To add the extra versioning information when building from git, the
`hack/build-go.sh` script will gather that information (using `git describe` and
`git rev-parse`) and then create a `-ldflags` string to pass to `go install` and
tell the Go linker to override the contents of those variables at build time. It
can, for instance, tell it to override `gitVersion` and set it to
`v0.4-13-g4567bcdef6789-dirty` and set `gitCommit` to `4567bcdef6789...` which
is the complete SHA1 of the (dirty) tree used at build time.
## Handling Official Versions
Handling official versions from git is easy, as long as there is an annotated
git tag pointing to a specific version then `git describe` will return that tag
exactly which will match the idea of an official version (e.g. `v0.5`).
Handling it on tarballs is a bit harder since the exact version string must be
present in `pkg/version/base.go` for it to get embedded into the binaries. But
simply creating a commit with `v0.5` on its own would mean that the commits
coming after it would also get the `v0.5` version when built from tarball or `go
get` while in fact they do not match `v0.5` (the one that was tagged) exactly.
To handle that case, creating a new release should involve creating two adjacent
commits where the first of them will set the version to `v0.5` and the second
will set it to `v0.5-dev`. In that case, even in the presence of merges, there
will be a single commit where the exact `v0.5` version will be used and all
others around it will either have `v0.4-dev` or `v0.5-dev`.
The diagram below illustrates it.
![Diagram of git commits involved in the release](./releasing.png)
After working on `v0.4-dev` and merging PR 99 we decide it is time to release
`v0.5`. So we start a new branch, create one commit to update
`pkg/version/base.go` to include `gitVersion = "v0.5"` and `git commit` it.
We test it and make sure everything is working as expected.
Before sending a PR for it, we create a second commit on that same branch,
updating `pkg/version/base.go` to include `gitVersion = "v0.5-dev"`. That will
ensure that further builds (from tarball or `go install`) on that tree will
always include the `-dev` prefix and will not have a `v0.5` version (since they
do not match the official `v0.5` exactly.)
We then send PR 100 with both commits in it.
Once the PR is accepted, we can use `git tag -a` to create an annotated tag
*pointing to the one commit* that has `v0.5` in `pkg/version/base.go` and push
it to GitHub. (Unfortunately GitHub tags/releases are not annotated tags, so
this needs to be done from a git client and pushed to GitHub using SSH.)
## Parallel Commits
While we are working on releasing `v0.5`, other development takes place and
other PRs get merged. For instance, in the example above, PRs 101 and 102 get
merged to the master branch before the versioning PR gets merged.
This is not a problem, it is only slightly inaccurate that checking out the tree
at commit `012abc` or commit `345cde` or at the commit of the merges of PR 101
or 102 will yield a version of `v0.4-dev` *but* those commits are not present in
`v0.5`.
In that sense, there is a small window in which commits will get a
`v0.4-dev` or `v0.4-N-gXXX` label and while they're indeed later than `v0.4`
but they are not really before `v0.5` in that `v0.5` does not contain those
commits.
Unfortunately, there is not much we can do about it. On the other hand, other
projects seem to live with that and it does not really become a large problem.
As an example, Docker commit a327d9b91edf has a `v1.1.1-N-gXXX` label but it is
not present in Docker `v1.2.0`:
```
$ git describe a327d9b91edf
v1.1.1-822-ga327d9b91edf
$ git log --oneline v1.2.0..a327d9b91edf
a327d9b91edf Fix data space reporting from Kb/Mb to KB/MB
(Non-empty output here means the commit is not present on v1.2.0.)
```
## Release Notes
No official release should be made final without properly matching release notes.
There should be made available, per release, a small summary, preamble, of the
major changes, both in terms of feature improvements/bug fixes and notes about
functional feature changes (if any) regarding the previous released version so
that the BOM regarding updating to it gets as obvious and trouble free as possible.
After this summary, preamble, all the relevant PRs/issues that got in that
version should be listed and linked together with a small summary understandable
by plain mortals (in a perfect world PR/issue's title would be enough but often
it is just too cryptic/geeky/domain-specific that it isn't).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/releasing.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/releasing.md?pixel)]()

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@ -0,0 +1,113 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.36.0 (20140111.2315)
-->
<!-- Title: tagged_release Pages: 1 -->
<svg width="257pt" height="360pt"
viewBox="0.00 0.00 257.33 360.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(0.649819 0.649819) rotate(0) translate(4 550)">
<title>tagged_release</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-550 392,-550 392,4 -4,4"/>
<!-- ci012abc -->
<g id="node1" class="node"><title>ci012abc</title>
<ellipse fill="#ffffcc" stroke="black" cx="56" cy="-115" rx="42.7926" ry="42.7926"/>
<text text-anchor="middle" x="56" y="-111.3" font-family="Consolas, Liberation Mono, Menlo, Courier, monospace" font-size="14.00">012abc</text>
</g>
<!-- pr101 -->
<g id="node2" class="node"><title>pr101</title>
<polygon fill="#ccccff" stroke="black" points="112,-255 0,-255 0,-219 112,-219 112,-255"/>
<text text-anchor="middle" x="56" y="-233.3" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">Merge PR #101</text>
</g>
<!-- ci012abc&#45;&gt;pr101 -->
<g id="edge1" class="edge"><title>ci012abc&#45;&gt;pr101</title>
<path fill="none" stroke="black" d="M56,-157.97C56,-174.83 56,-193.784 56,-208.789"/>
<polygon fill="black" stroke="black" points="52.5001,-208.93 56,-218.93 59.5001,-208.93 52.5001,-208.93"/>
</g>
<!-- ci345cde -->
<g id="node3" class="node"><title>ci345cde</title>
<ellipse fill="#ffffcc" stroke="black" cx="62" cy="-359" rx="42.7926" ry="42.7926"/>
<text text-anchor="middle" x="62" y="-355.3" font-family="Consolas, Liberation Mono, Menlo, Courier, monospace" font-size="14.00">345cde</text>
</g>
<!-- pr101&#45;&gt;ci345cde -->
<g id="edge2" class="edge"><title>pr101&#45;&gt;ci345cde</title>
<path fill="none" stroke="black" d="M56.8597,-255.193C57.5237,-268.473 58.4796,-287.592 59.3874,-305.748"/>
<polygon fill="black" stroke="black" points="55.904,-306.17 59.8991,-315.982 62.8953,-305.82 55.904,-306.17"/>
</g>
<!-- pr102 -->
<g id="node4" class="node"><title>pr102</title>
<polygon fill="#ccccff" stroke="black" points="129,-474 17,-474 17,-438 129,-438 129,-474"/>
<text text-anchor="middle" x="73" y="-452.3" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">Merge PR #102</text>
</g>
<!-- ci345cde&#45;&gt;pr102 -->
<g id="edge3" class="edge"><title>ci345cde&#45;&gt;pr102</title>
<path fill="none" stroke="black" d="M66.8248,-401.668C67.8523,-410.542 68.9117,-419.692 69.8567,-427.853"/>
<polygon fill="black" stroke="black" points="66.3936,-428.375 71.0207,-437.906 73.3472,-427.57 66.3936,-428.375"/>
</g>
<!-- pr100 -->
<g id="node10" class="node"><title>pr100</title>
<polygon fill="#ccccff" stroke="black" points="174,-546 62,-546 62,-510 174,-510 174,-546"/>
<text text-anchor="middle" x="118" y="-524.3" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">Merge PR #100</text>
</g>
<!-- pr102&#45;&gt;pr100 -->
<g id="edge6" class="edge"><title>pr102&#45;&gt;pr100</title>
<path fill="none" stroke="black" d="M84.1236,-474.303C89.355,-482.441 95.6999,-492.311 101.478,-501.299"/>
<polygon fill="black" stroke="black" points="98.6526,-503.377 107.004,-509.896 104.541,-499.591 98.6526,-503.377"/>
</g>
<!-- version_commit -->
<g id="node5" class="node"><title>version_commit</title>
<ellipse fill="#ccffcc" stroke="black" cx="173" cy="-237" rx="42.7926" ry="42.7926"/>
<text text-anchor="middle" x="173" y="-233.3" font-family="Consolas, Liberation Mono, Menlo, Courier, monospace" font-size="14.00">678fed</text>
</g>
<!-- dev_commit -->
<g id="node6" class="node"><title>dev_commit</title>
<ellipse fill="#ffffcc" stroke="black" cx="169" cy="-359" rx="42.7926" ry="42.7926"/>
<text text-anchor="middle" x="169" y="-355.3" font-family="Consolas, Liberation Mono, Menlo, Courier, monospace" font-size="14.00">456dcb</text>
</g>
<!-- version_commit&#45;&gt;dev_commit -->
<g id="edge4" class="edge"><title>version_commit&#45;&gt;dev_commit</title>
<path fill="none" stroke="black" d="M171.601,-279.97C171.322,-288.326 171.027,-297.195 170.739,-305.844"/>
<polygon fill="black" stroke="black" points="167.24,-305.74 170.405,-315.851 174.236,-305.973 167.24,-305.74"/>
</g>
<!-- dev_commit&#45;&gt;pr100 -->
<g id="edge8" class="edge"><title>dev_commit&#45;&gt;pr100</title>
<path fill="none" stroke="black" d="M158.36,-400.568C152.438,-422.433 144.719,-449.815 137,-474 134.253,-482.606 131.013,-491.906 127.994,-500.278"/>
<polygon fill="black" stroke="black" points="124.616,-499.325 124.47,-509.919 131.191,-501.729 124.616,-499.325"/>
</g>
<!-- release_info -->
<g id="node7" class="node"><title>release_info</title>
<text text-anchor="middle" x="304" y="-240.8" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">pkg/version/base.go:</text>
<text text-anchor="middle" x="304" y="-225.8" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">gitVersion = &quot;v0.5&quot;;</text>
</g>
<!-- dev_info -->
<g id="node8" class="node"><title>dev_info</title>
<text text-anchor="middle" x="309" y="-362.8" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">pkg/version/base.go:</text>
<text text-anchor="middle" x="309" y="-347.8" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">gitVersion = &quot;v0.5&#45;dev&quot;;</text>
</g>
<!-- pr99 -->
<g id="node9" class="node"><title>pr99</title>
<polygon fill="#ccccff" stroke="black" points="143,-36 39,-36 39,-0 143,-0 143,-36"/>
<text text-anchor="middle" x="91" y="-14.3" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">Merge PR #99</text>
</g>
<!-- pr99&#45;&gt;ci012abc -->
<g id="edge5" class="edge"><title>pr99&#45;&gt;ci012abc</title>
<path fill="none" stroke="black" d="M84.5805,-36.4245C81.5586,-44.6267 77.7879,-54.8615 73.9865,-65.1795"/>
<polygon fill="black" stroke="black" points="70.6804,-64.0292 70.5075,-74.6226 77.2488,-66.4492 70.6804,-64.0292"/>
</g>
<!-- pr99&#45;&gt;version_commit -->
<g id="edge7" class="edge"><title>pr99&#45;&gt;version_commit</title>
<path fill="none" stroke="black" d="M97.4344,-36.0276C109.585,-68.1826 136.317,-138.924 154.498,-187.038"/>
<polygon fill="black" stroke="black" points="151.318,-188.523 158.127,-196.64 157.866,-186.048 151.318,-188.523"/>
</g>
<!-- tag -->
<g id="node11" class="node"><title>tag</title>
<ellipse fill="#ffcccc" stroke="black" cx="226" cy="-115" rx="71.4873" ry="18"/>
<text text-anchor="middle" x="226" y="-111.3" font-family="Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" font-size="14.00">$ git tag &#45;a v0.5</text>
</g>
<!-- tag&#45;&gt;version_commit -->
<g id="edge9" class="edge"><title>tag&#45;&gt;version_commit</title>
<path fill="none" stroke="black" d="M218.519,-132.939C212.168,-147.318 202.736,-168.673 194.103,-188.22"/>
<polygon fill="black" stroke="black" points="190.784,-187.071 189.946,-197.632 197.188,-189.899 190.784,-187.071"/>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 7.1 KiB

View File

@ -0,0 +1,105 @@
# Writing a Getting Started Guide
This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes.
It also gives some guidelines which reviewers should follow when reviewing a pull request for a
guide.
A Getting Started Guide is instructions on how to create a Kubernetes cluster on top of a particular
type(s) of infrastructure. Infrastructure includes: the IaaS provider for VMs;
the node OS; inter-node networking; and node Configuration Management system.
A guide refers to scripts, Configuration Management files, and/or binary assets such as RPMs. We call
the combination of all these things needed to run on a particular type of infrastructure a
**distro**.
[The Matrix](../../docs/getting-started-guides/README.md) lists the distros. If there is already a guide
which is similar to the one you have planned, consider improving that one.
Distros fall into two categories:
- **versioned distros** are tested to work with a particular binary release of Kubernetes. These
come in a wide variety, reflecting a wide range of ideas and preferences in how to run a cluster.
- **development distros** are tested work with the latest Kubernetes source code. But, there are
relatively few of these and the bar is much higher for creating one.
There are different guidelines for each.
## Versioned Distro Guidelines
These guidelines say *what* to do. See the Rationale section for *why*.
- Send us a PR.
- Put the instructions in `docs/getting-started-guides/...`. Scripts go there too. This helps devs easily
search for uses of flags by guides.
- We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your
own repo.
- Setup a cluster and run the [conformance test](../../docs/devel/conformance-test.md) against it, and report the
results in your PR.
- Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md).
- State the binary version of kubernetes that you tested clearly in your Guide doc and in The Matrix.
- Even if you are just updating the binary version used, please still do a conformance test.
- If it worked before and now fails, you can ask on IRC,
check the release notes since your last tested version, or look at git -logs for files in other distros
that are updated to the new version.
- Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer
distros.
- If a versioned distro has not been updated for many binary releases, it may be dropped from the Matrix.
If you have a cluster partially working, but doing all the above steps seems like too much work,
we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page.
Just file an issue or chat us on IRC and one of the committers will link to it from the wiki.
## Development Distro Guidelines
These guidelines say *what* to do. See the Rationale section for *why*.
- the main reason to add a new development distro is to support a new IaaS provider (VM and
network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`.
- Development distros should use Saltstack for Configuration Management.
- development distros need to support automated cluster creation, deletion, upgrading, etc.
This mean writing scripts in `cluster/$IAAS_NAME`.
- all commits to the tip of this repo need to not break any of the development distros
- the author of the change is responsible for making changes necessary on all the cloud-providers if the
change affects any of them, and reverting the change if it breaks any of the CIs.
- a development distro needs to have an organization which owns it. This organization needs to:
- Setting up and maintaining Continuous Integration that runs e2e frequently (multiple times per day) against the
Distro at head, and which notifies all devs of breakage.
- being reasonably available for questions and assisting with
refactoring and feature additions that affect code for their IaaS.
## Rationale
- We want want people to create Kubernetes clusters with whatever IaaS, Node OS,
configuration management tools, and so on, which they are familiar with. The
guidelines for **versioned distros** are designed for flexibility.
- We want developers to be able to work without understanding all the permutations of
IaaS, NodeOS, and configuration management. The guidelines for **developer distros** are designed
for consistency.
- We want users to have a uniform experience with Kubernetes whenever they follow instructions anywhere
in our Github repository. So, we ask that versioned distros pass a **conformance test** to make sure
really work.
- We ask versioned distros to **clearly state a version**. People pulling from Github may
expect any instructions there to work at Head, so stuff that has not been tested at Head needs
to be called out. We are still changing things really fast, and, while the REST API is versioned,
it is not practical at this point to version or limit changes that affect distros. We still change
flags at the Kubernetes/Infrastructure interface.
- We want to **limit the number of development distros** for several reasons. Developers should
only have to change a limited number of places to add a new feature. Also, since we will
gate commits on passing CI for all distros, and since end-to-end tests are typically somewhat
flaky, it would be highly likely for there to be false positives and CI backlogs with many CI pipelines.
- We do not require versioned distros to do **CI** for several reasons. It is a steep
learning curve to understand our our automated testing scripts. And it is considerable effort
to fully automate setup and teardown of a cluster, which is needed for CI. And, not everyone
has the time and money to run CI. We do not want to
discourage people from writing and sharing guides because of this.
- Versioned distro authors are free to run their own CI and let us know if there is breakage, but we
will not include them as commit hooks -- there cannot be so many commit checks that it is impossible
to pass them all.
- We prefer a single Configuration Management tool for development distros. If there were more
than one, the core developers would have to learn multiple tools and update config in multiple
places. **Saltstack** happens to be the one we picked when we started the project. We
welcome versioned distros that use any tool; there are already examples of
CoreOS Fleet, Ansible, and others.
- You can still run code from head or your own branch
if you use another Configuration Management tool -- you just have to do some manual steps
during testing and deployment.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-a-getting-started-guide.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/devel/writing-a-getting-started-guide.md?pixel)]()

View File

@ -0,0 +1,41 @@
# Kubernetes Developer Guide
The developer guide is for anyone wanting to either write code which directly accesses the
kubernetes API, or to contribute directly to the kubernetes project.
It assumes some familiarity with concepts in the [User Guide](user-guide.md) and the [Cluster Admin
Guide](cluster-admin-guide.md).
## Developing against the Kubernetes API
* API objects are explained at [http://kubernetes.io/third_party/swagger-ui/](http://kubernetes.io/third_party/swagger-ui/).
* **Annotations** ([annotations.md](annotations.md)): are for attaching arbitrary non-identifying metadata to objects.
Programs that automate Kubernetes objects may use annotations to store small amounts of their state.
* **API Conventions** ([api-conventions.md](api-conventions.md)):
Defining the verbs and resources used in the Kubernetes API.
* **API Client Libraries** ([client-libraries.md](client-libraries.md)):
A list of existing client libraries, both supported and user-contributed.
## Writing Plugins
* **Authentication Plugins** ([authentication.md](authentication.md)):
The current and planned states of authentication tokens.
* **Authorization Plugins** ([authorization.md](authorization.md)):
Authorization applies to all HTTP requests on the main apiserver port.
This doc explains the available authorization implementations.
* **Admission Control Plugins** ([admission_control](design/admission_control.md))
## Contributing to the Kubernetes Project
See this [README](../docs/devel/README.md).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/developer-guide.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/developer-guide.md?pixel)]()

View File

@ -0,0 +1,44 @@
# DNS Integration with Kubernetes
As of kubernetes 0.8, DNS is offered as a cluster add-on. If enabled, a DNS
Pod and Service will be scheduled on the cluster, and the kubelets will be
configured to tell individual containers to use the DNS Service's IP.
Every Service defined in the cluster (including the DNS server itself) will be
assigned a DNS name. By default, a client Pod's DNS search list will
include the Pod's own namespace and the cluster's default domain. This is best
illustrated by example:
Assume a Service named `foo` in the kubernetes namespace `bar`. A Pod running
in namespace `bar` can look up this service by simply doing a DNS query for
`foo`. A Pod running in namespace `quux` can look up this service by doing a
DNS query for `foo.bar`.
The cluster DNS server ([SkyDNS](https://github.com/skynetservices/skydns))
supports forward lookups (A records) and service lookups (SRV records).
## How it Works
The DNS pod that runs holds 3 containers - skydns, etcd (which skydns uses),
and a kubernetes-to-skydns bridge called kube2sky. The kube2sky process
watches the kubernetes master for changes in Services, and then writes the
information to etcd, which skydns reads. This etcd instance is not linked to
any other etcd clusters that might exist, including the kubernetes master.
## Issues
The skydns service is reachable directly from kubernetes nodes (outside
of any container) and DNS resolution works if the skydns service is targeted
explicitly. However, nodes are not configured to use the cluster DNS service or
to search the cluster's DNS domain by default. This may be resolved at a later
time.
## For more information
See [the docs for the cluster addon](../cluster/addons/dns/README.md).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/dns.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/dns.md?pixel)]()

View File

@ -0,0 +1,53 @@
# Downward API
The downward API allows containers to consume information about the system without coupling to the
kubernetes client or REST API.
### Capabilities
Containers can consume the following information via the downward API:
* Their pod's name
* Their pod's namespace
### Consuming information about a pod in a container
Containers consume information from the downward API using environment variables. In the future,
containers will also be able to consume the downward API via a volume plugin. The `valueFrom`
field of an environment variable allows you to specify an `ObjectFieldSelector` to select fields
from the pod's definition. The `ObjectFieldSelector` has an `apiVersion` field and a `fieldPath`
field. The `fieldPath` field is an expression designating a field on the pod. The `apiVersion`
field is the version of the API schema that the `fieldPath` is written in terms of. If the
`apiVersion` field is not specified it is defaulted to the API version of the enclosing object.
### Example: consuming the downward API
This is an example of a pod that consumes its name and namespace via the downward API:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox
command: [ "/bin/sh", "-c", "env" ]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
restartPolicy: Never
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/downward_api.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/downward_api.md?pixel)]()

View File

@ -0,0 +1,66 @@
If you are not sure what OSes and infrastructure is supported, the table below lists all the combinations which have
been tested recently.
For the easiest "kick the tires" experience, please try the [local docker](docker.md) guide.
If you are considering contributing a new guide, please read the
[guidelines](../../docs/devel/writing-a-getting-started-guide.md).
IaaS Provider | Config. Mgmt | OS | Networking | Docs | Support Level | Notes
-------------- | ------------ | ------ | ---------- | ---------------------------------------------------- | ---------------------------- | -----
GKE | | | GCE | [docs](https://cloud.google.com/container-engine) | Commercial | Uses K8s version 0.15.0
Vagrant | Saltstack | Fedora | OVS | [docs](../../docs/getting-started-guides/vagrant.md) | Project | Uses latest via https://get.k8s.io/
GCE | Saltstack | Debian | GCE | [docs](../../docs/getting-started-guides/gce.md) | Project | Tested with 0.15.0 by @robertbailey
Azure | CoreOS | CoreOS | Weave | [docs](../../docs/getting-started-guides/coreos/azure/README.md) | Community ([@errordeveloper](https://github.com/errordeveloper), [@squillace](https://github.com/squillace), [@chanezon](https://github.com/chanezon), [@crossorigin](https://github.com/crossorigin)) | Uses K8s version 0.17.0
Docker Single Node | custom | N/A | local | [docs](docker.md) | Project (@brendandburns) | Tested @ 0.14.1 |
Docker Multi Node | Flannel | N/A | local | [docs](docker-multinode.md) | Project (@brendandburns) | Tested @ 0.14.1 |
Bare-metal | Ansible | Fedora | flannel | [docs](../../docs/getting-started-guides/fedora/fedora_ansible_config.md) | Project | Uses K8s v0.13.2
Bare-metal | custom | Fedora | _none_ | [docs](../../docs/getting-started-guides/fedora/fedora_manual_config.md) | Project | Uses K8s v0.13.2
Bare-metal | custom | Fedora | flannel | [docs](../../docs/getting-started-guides/fedora/flannel_multi_node_cluster.md) | Community ([@aveshagarwal](https://github.com/aveshagarwal))| Tested with 0.15.0
libvirt | custom | Fedora | flannel | [docs](../../docs/getting-started-guides/fedora/flannel_multi_node_cluster.md) | Community ([@aveshagarwal](https://github.com/aveshagarwal))| Tested with 0.15.0
KVM | custom | Fedora | flannel | [docs](../../docs/getting-started-guides/fedora/flannel_multi_node_cluster.md) | Community ([@aveshagarwal](https://github.com/aveshagarwal))| Tested with 0.15.0
Mesos/GCE | | | | [docs](../../docs/getting-started-guides/mesos.md) | [Community](https://github.com/mesosphere/kubernetes-mesos) ([@jdef](https://github.com/jdef)) | Uses K8s v0.11.2
AWS | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/coreos.md) | Community | Uses K8s version 0.17.0
GCE | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/coreos.md) | Community (@kelseyhightower) | Uses K8s version 0.15.0
Vagrant | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/coreos.md) | Community ( [@pires](https://github.com/pires), [@AntonioMeireles](https://github.com/AntonioMeireles) ) | Uses K8s version 0.15.0
Bare-metal (Offline) | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/coreos/bare_metal_offline.md) | Community([@jeffbean](https://github.com/jeffbean)) | Uses K8s version 0.15.0
CloudStack | Ansible | CoreOS | flannel | [docs](../../docs/getting-started-guides/cloudstack.md)| Community (@runseb) | Uses K8s version 0.9.1
Vmware | | Debian | OVS | [docs](../../docs/getting-started-guides/vsphere.md) | Community (@pietern) | Uses K8s version 0.9.1
Bare-metal | custom | CentOS | _none_ | [docs](../../docs/getting-started-guides/centos/centos_manual_config.md) | Community(@coolsvap) | Uses K8s v0.9.1
AWS | Juju | Ubuntu | flannel | [docs](../../docs/getting-started-guides/juju.md) | [Community](https://github.com/whitmo/bundle-kubernetes) ( [@whit](https://github.com/whitmo), [@matt](https://github.com/mbruzek), [@chuck](https://github.com/chuckbutler) ) | [Tested](http://reports.vapour.ws/charm-tests-by-charm/kubernetes) K8s v0.8.1
OpenStack/HPCloud | Juju | Ubuntu | flannel | [docs](../../docs/getting-started-guides/juju.md) | [Community](https://github.com/whitmo/bundle-kubernetes) ( [@whit](https://github.com/whitmo), [@matt](https://github.com/mbruzek), [@chuck](https://github.com/chuckbutler) ) | [Tested](http://reports.vapour.ws/charm-tests-by-charm/kubernetes) K8s v0.8.1
Joyent | Juju | Ubuntu | flannel | [docs](../../docs/getting-started-guides/juju.md) | [Community](https://github.com/whitmo/bundle-kubernetes) ( [@whit](https://github.com/whitmo), [@matt](https://github.com/mbruzek), [@chuck](https://github.com/chuckbutler) ) | [Tested](http://reports.vapour.ws/charm-tests-by-charm/kubernetes) K8s v0.8.1
AWS | Saltstack | Ubuntu | OVS | [docs](../../docs/getting-started-guides/aws.md) | Community (@justinsb) | Uses K8s version 0.5.0
Vmware | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/coreos.md) | Community (@kelseyhightower) | Uses K8s version 0.15.0
Azure | Saltstack | Ubuntu | OpenVPN | [docs](../../docs/getting-started-guides/azure.md) | Community |
Bare-metal | custom | Ubuntu | flannel | [docs](../../docs/getting-started-guides/ubuntu.md) | Community (@resouer @WIZARD-CXY) | use k8s version 0.18.0
Docker Single Node | custom | N/A | local | [docs](docker.md) | Project (@brendandburns) | Tested @ 0.14.1 |
Docker Multi Node | Flannel| N/A | local | [docs](docker-multinode.md) | Project (@brendandburns) | Tested @ 0.14.1 |
Local | | | _none_ | [docs](../../docs/getting-started-guides/locally.md) | Community (@preillyme) |
libvirt/KVM | CoreOS | CoreOS | libvirt/KVM | [docs](../../docs/getting-started-guides/libvirt-coreos.md) | Community (@lhuard1A) |
oVirt | | | | [docs](../../docs/getting-started-guides/ovirt.md) | Community (@simon3z) |
Rackspace | CoreOS | CoreOS | flannel | [docs](../../docs/getting-started-guides/rackspace.md) | Community (@doublerr) | use k8s version 0.18.0
*Note*: The above table is ordered by version test/used in notes followed by support level.
Definition of columns:
- **IaaS Provider** is who/what provides the virtual or physical machines (nodes) that Kubernetes runs on.
- **OS** is the base operating system of the nodes.
- **Config. Mgmt** is the configuration management system that helps install and maintain kubernetes software on the
nodes.
- **Networking** is what implements the [networking model](../../docs/networking.md). Those with networking type
_none_ may not support more than one node, or may support multiple VM nodes only in the same physical node.
- Support Levels
- **Project**: Kubernetes Committers regularly use this configuration, so it usually works with the latest release
of Kubernetes.
- **Commercial**: A commercial offering with its own support arrangements.
- **Community**: Actively supported by community contributions. May not work with more recent releases of kubernetes.
- **Inactive**: No active maintainer. Not recommended for first-time K8s users, and may be deleted soon.
- **Notes** is relevant information such as version k8s used.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/README.md?pixel)]()

View File

@ -0,0 +1,220 @@
# Getting started on Amazon EC2 with CoreOS
The example below creates an elastic Kubernetes cluster with a custom number of worker nodes and a master.
**Warning:** contrary to the [supported procedure](aws.md), the examples below provision Kubernetes with an insecure API server (plain HTTP,
no security tokens, no basic auth). For demonstration purposes only.
## Highlights
* Cluster bootstrapping using [cloud-config](https://coreos.com/docs/cluster-management/setup/cloudinit-cloud-config/)
* Cross container networking with [flannel](https://github.com/coreos/flannel#flannel)
* Auto worker registration with [kube-register](https://github.com/kelseyhightower/kube-register#kube-register)
* Kubernetes v0.17.0 [official binaries](https://github.com/GoogleCloudPlatform/kubernetes/releases/tag/v0.17.0)
## Prerequisites
* [aws CLI](http://aws.amazon.com/cli)
* [CoreOS image for AWS](https://coreos.com/docs/running-coreos/cloud-providers/ec2/)
* [kubectl CLI](aws/kubectl.md)
## Starting a Cluster
### CloudFormation
The [cloudformation-template.json](aws/cloudformation-template.json) can be used to bootstrap a Kubernetes cluster with a single command:
```bash
aws cloudformation create-stack --stack-name kubernetes --region us-west-2 \
--template-body file://aws/cloudformation-template.json \
--parameters ParameterKey=KeyPair,ParameterValue=<keypair> \
ParameterKey=ClusterSize,ParameterValue=<cluster_size> \
ParameterKey=VpcId,ParameterValue=<vpc_id> \
ParameterKey=SubnetId,ParameterValue=<subnet_id> \
ParameterKey=SubnetAZ,ParameterValue=<subnet_az>
```
It will take a few minutes for the entire stack to come up. You can monitor the stack progress with the following command:
```bash
aws cloudformation describe-stack-events --stack-name kubernetes
```
Record the Kubernetes Master IP address:
```bash
aws cloudformation describe-stacks --stack-name kubernetes
```
[Skip to kubectl client configuration](#configure-the-kubectl-ssh-tunnel)
### AWS CLI
The following commands shall use the latest CoreOS alpha AMI for the `us-west-2` region. For a list of different regions and corresponding AMI IDs see the [CoreOS EC2 cloud provider documentation](https://coreos.com/docs/running-coreos/cloud-providers/ec2/#choosing-a-channel).
#### Create the Kubernetes Security Group
```bash
aws ec2 create-security-group --group-name kubernetes --description "Kubernetes Security Group"
aws ec2 authorize-security-group-ingress --group-name kubernetes --protocol tcp --port 22 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name kubernetes --protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name kubernetes --source-security-group-name kubernetes
```
#### Save the master and node cloud-configs
* [master.yaml](aws/cloud-configs/master.yaml)
* [node.yaml](aws/cloud-configs/node.yaml)
#### Launch the master
*Attention:* replace `<ami_image_id>` below for a [suitable version of CoreOS image for AWS](https://coreos.com/docs/running-coreos/cloud-providers/ec2/).
```bash
aws ec2 run-instances --image-id <ami_image_id> --key-name <keypair> \
--region us-west-2 --security-groups kubernetes --instance-type m3.medium \
--user-data file://master.yaml
```
Record the `InstanceId` for the master.
Gather the public and private IPs for the master node:
```bash
aws ec2 describe-instances --instance-id <instance-id>
```
```
{
"Reservations": [
{
"Instances": [
{
"PublicDnsName": "ec2-54-68-97-117.us-west-2.compute.amazonaws.com",
"RootDeviceType": "ebs",
"State": {
"Code": 16,
"Name": "running"
},
"PublicIpAddress": "54.68.97.117",
"PrivateIpAddress": "172.31.9.9",
...
```
#### Update the node.yaml cloud-config
Edit `node.yaml` and replace all instances of `<master-private-ip>` with the **private** IP address of the master node.
### Launch 3 worker nodes
*Attention:* Replace `<ami_image_id>` below for a [suitable version of CoreOS image for AWS](https://coreos.com/docs/running-coreos/cloud-providers/ec2/#choosing-a-channel).
```bash
aws ec2 run-instances --count 3 --image-id <ami_image_id> --key-name <keypair> \
--region us-west-2 --security-groups kubernetes --instance-type m3.medium \
--user-data file://node.yaml
```
### Add additional worker nodes
*Attention:* replace `<ami_image_id>` below for a [suitable version of CoreOS image for AWS](https://coreos.com/docs/running-coreos/cloud-providers/ec2/#choosing-a-channel).
```bash
aws ec2 run-instances --count 1 --image-id <ami_image_id> --key-name <keypair> \
--region us-west-2 --security-groups kubernetes --instance-type m3.medium \
--user-data file://node.yaml
```
### Configure the kubectl SSH tunnel
This command enables secure communication between the kubectl client and the Kubernetes API.
```bash
ssh -f -nNT -L 8080:127.0.0.1:8080 core@<master-public-ip>
```
### Listing worker nodes
Once the worker instances have fully booted, they will be automatically registered with the Kubernetes API server by the kube-register service running on the master node. It may take a few mins.
```bash
kubectl get nodes
```
## Starting a simple pod
Create a pod manifest: `pod.json`
```json
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "hello",
"labels": {
"name": "hello",
"environment": "testing"
}
},
"spec": {
"containers": [{
"name": "hello",
"image": "quay.io/kelseyhightower/hello",
"ports": [{
"containerPort": 80,
"hostPort": 80
}]
}]
}
}
```
### Create the pod using the kubectl command line tool
```bash
kubectl create -f pod.json
```
### Testing
```bash
kubectl get pods
```
Record the **Host** of the pod, which should be the private IP address.
Gather the public IP address for the worker node.
```bash
aws ec2 describe-instances --filters 'Name=private-ip-address,Values=<host>'
```
```
{
"Reservations": [
{
"Instances": [
{
"PublicDnsName": "ec2-54-68-97-117.us-west-2.compute.amazonaws.com",
"RootDeviceType": "ebs",
"State": {
"Code": 16,
"Name": "running"
},
"PublicIpAddress": "54.68.97.117",
...
```
Visit the public IP address in your browser to view the running pod.
### Delete the pod
```bash
kubectl delete pods hello
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/aws-coreos.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/aws-coreos.md?pixel)]()

View File

@ -0,0 +1,89 @@
# Getting started on AWS EC2
## Prerequisites
1. You need an AWS account. Visit [http://aws.amazon.com](http://aws.amazon.com) to get started
2. Install and configure [AWS Command Line Interface](http://aws.amazon.com/cli)
3. You need an AWS [instance profile and role](http://docs.aws.amazon.com/IAM/latest/UserGuide/instance-profiles.html) with EC2 full access.
## Cluster turnup
### Supported procedure: `get-kube`
```bash
#Using wget
export KUBERNETES_PROVIDER=aws; wget -q -O - https://get.k8s.io | bash
#Using cURL
export KUBERNETES_PROVIDER=aws; curl -sS https://get.k8s.io | bash
```
NOTE: This script calls [cluster/kube-up.sh](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/kube-up.sh)
which in turn calls [cluster/aws/util.sh](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/aws/util.sh)
using [cluster/aws/config-default.sh](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/aws/config-default.sh).
This process takes about 5 to 10 minutes. Once the cluster is up, the IP addresses of your master and node(s) will be printed,
as well as information about the default services running in the cluster (monitoring, logging, dns). User credentials and security
tokens are written in `~/.kube/kubeconfig`, they will be necessary to use the CLI or the HTTP Basic Auth.
By default, the script will provision a new VPC and a 4 node k8s cluster in us-west-2a (Oregon) with `t2.micro` instances running on Ubuntu.
You can override the variables defined in [config-default.sh](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/aws/config-default.sh) to change this behavior as follows:
```bash
export KUBE_AWS_ZONE=eu-west-1c
export NUM_MINIONS=2
export MINION_SIZE=m3.medium
export AWS_S3_REGION=eu-west-1
export AWS_S3_BUCKET=mycompany-kubernetes-artifacts
export INSTANCE_PREFIX=k8s
...
```
It will also try to create or reuse a keypair called "kubernetes", and IAM profiles called "kubernetes-master" and "kubernetes-minion".
If these already exist, make sure you want them to be used here.
NOTE: If using an existing keypair named "kubernetes" then you must set the `AWS_SSH_KEY` key to point to your private key.
### Alternatives
A contributed [example](aws-coreos.md) allows you to setup a Kubernetes cluster based on [CoreOS](http://www.coreos.com), either using
AWS CloudFormation or EC2 with user data (cloud-config).
## Getting started with your cluster
### Command line administration tool: `kubectl`
Copy the appropriate `kubectl` binary to any location defined in your `PATH` environment variable, for example:
```bash
# OS X
sudo cp kubernetes/platforms/darwin/amd64/kubectl /usr/local/bin/kubectl
# Linux
sudo cp kubernetes/platforms/linux/amd64/kubectl /usr/local/bin/kubectl
```
An up-to-date documentation page for this tool is available here: [kubectl manual](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/kubectl.md)
By default, `kubectl` will use the `kubeconfig` file generated during the cluster startup for authenticating against the API.
For more information, please read [kubeconfig files](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/kubeconfig-file.md)
### Examples
See [a simple nginx example](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/examples/simple-nginx.md) to try out your new cluster.
The "Guestbook" application is another popular example to get started with Kubernetes: [guestbook example](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/guestbook)
For more complete applications, please look in the [examples directory](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/examples)
## Tearing down the cluster
Make sure the environment variables you used to provision your cluster are still exported, then call the following script inside the
`kubernetes` directory:
```bash
cluster/kube-down.sh
```
## Further reading
Please see the [Kubernetes docs](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/docs) for more details on administering
and using a Kubernetes cluster.
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/aws.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/aws.md?pixel)]()

View File

@ -0,0 +1,177 @@
#cloud-config
write_files:
- path: /opt/bin/waiter.sh
owner: root
permissions: 0755
content: |
#! /usr/bin/bash
until curl http://127.0.0.1:2379/v2/machines; do sleep 2; done
coreos:
etcd2:
name: master
initial-cluster-token: k8s_etcd
initial-cluster: master=http://$private_ipv4:2380
listen-peer-urls: http://$private_ipv4:2380,http://localhost:2380
initial-advertise-peer-urls: http://$private_ipv4:2380
listen-client-urls: http://$private_ipv4:2379,http://localhost:2379
advertise-client-urls: http://$private_ipv4:2379
fleet:
etcd_servers: http://localhost:2379
metadata: k8srole=master
flannel:
etcd_endpoints: http://localhost:2379
locksmithd:
endpoint: http://localhost:2379
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
- name: etcd2-waiter.service
command: start
content: |
[Unit]
Description=etcd waiter
Wants=network-online.target
Wants=etcd2.service
After=etcd2.service
After=network-online.target
Before=flanneld.service fleet.service locksmithd.service
[Service]
ExecStart=/usr/bin/bash /opt/bin/waiter.sh
RemainAfterExit=true
Type=oneshot
- name: flanneld.service
command: start
drop-ins:
- name: 50-network-config.conf
content: |
[Service]
ExecStartPre=-/usr/bin/etcdctl mk /coreos.com/network/config '{"Network": "10.244.0.0/16", "Backend": {"Type": "vxlan"}}'
- name: docker-cache.service
command: start
content: |
[Unit]
Description=Docker cache proxy
Requires=early-docker.service
After=early-docker.service
Before=early-docker.target
[Service]
Restart=always
TimeoutStartSec=0
RestartSec=5
Environment=TMPDIR=/var/tmp/
Environment=DOCKER_HOST=unix:///var/run/early-docker.sock
ExecStartPre=-/usr/bin/docker kill docker-registry
ExecStartPre=-/usr/bin/docker rm docker-registry
ExecStartPre=/usr/bin/docker pull quay.io/devops/docker-registry:latest
# GUNICORN_OPTS is an workaround for
# https://github.com/docker/docker-registry/issues/892
ExecStart=/usr/bin/docker run --rm --net host --name docker-registry \
-e STANDALONE=false \
-e GUNICORN_OPTS=[--preload] \
-e MIRROR_SOURCE=https://registry-1.docker.io \
-e MIRROR_SOURCE_INDEX=https://index.docker.io \
-e MIRROR_TAGS_CACHE_TTL=1800 \
quay.io/devops/docker-registry:latest
- name: docker.service
drop-ins:
- name: 51-docker-mirror.conf
content: |
[Unit]
# making sure that docker-cache is up and that flanneld finished
# startup, otherwise containers won't land in flannel's network...
Requires=docker-cache.service
After=docker-cache.service
[Service]
Environment=DOCKER_OPTS='--registry-mirror=http://$private_ipv4:5000'
- name: get-kubectl.service
command: start
content: |
[Unit]
Description=Get kubectl client tool
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kubectl
ExecStart=/usr/bin/chmod +x /opt/bin/kubectl
Type=oneshot
RemainAfterExit=true
- name: kube-apiserver.service
command: start
content: |
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=etcd2-waiter.service
After=etcd2-waiter.service
[Service]
ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-apiserver
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-apiserver
ExecStart=/opt/bin/kube-apiserver \
--insecure-bind-address=0.0.0.0 \
--service-cluster-ip-range=10.100.0.0/16 \
--etcd-servers=http://localhost:2379
Restart=always
RestartSec=10
- name: kube-controller-manager.service
command: start
content: |
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-controller-manager
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-controller-manager
ExecStart=/opt/bin/kube-controller-manager \
--master=127.0.0.1:8080
Restart=always
RestartSec=10
- name: kube-scheduler.service
command: start
content: |
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-scheduler
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-scheduler
ExecStart=/opt/bin/kube-scheduler \
--master=127.0.0.1:8080
Restart=always
RestartSec=10
- name: kube-register.service
command: start
content: |
[Unit]
Description=Kubernetes Registration Service
Documentation=https://github.com/kelseyhightower/kube-register
Requires=kube-apiserver.service fleet.service
After=kube-apiserver.service fleet.service
[Service]
ExecStartPre=-/usr/bin/wget -nc -O /opt/bin/kube-register https://github.com/kelseyhightower/kube-register/releases/download/v0.0.3/kube-register-0.0.3-linux-amd64
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-register
ExecStart=/opt/bin/kube-register \
--metadata=k8srole=node \
--fleet-endpoint=unix:///var/run/fleet.sock \
--api-endpoint=http://127.0.0.1:8080
Restart=always
RestartSec=10
update:
group: alpha
reboot-strategy: off

View File

@ -0,0 +1,81 @@
#cloud-config
write_files:
- path: /opt/bin/wupiao
owner: root
permissions: 0755
content: |
#!/bin/bash
# [w]ait [u]ntil [p]ort [i]s [a]ctually [o]pen
[ -n "$1" ] && [ -n "$2" ] && while ! curl --output /dev/null \
--silent --head --fail \
http://${1}:${2}; do sleep 1 && echo -n .; done;
exit $?
coreos:
etcd2:
listen-client-urls: http://localhost:2379
advertise-client-urls: http://0.0.0.0:2379
initial-cluster: master=http://<master-private-ip>:2380
proxy: on
fleet:
etcd_servers: http://localhost:2379
metadata: k8srole=node
flannel:
etcd_endpoints: http://localhost:2379
locksmithd:
endpoint: http://localhost:2379
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
- name: flanneld.service
command: start
- name: docker.service
command: start
drop-ins:
- name: 50-docker-mirror.conf
content: |
[Service]
Environment=DOCKER_OPTS='--registry-mirror=http://<master-private-ip>:5000'
- name: kubelet.service
command: start
content: |
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=network-online.target
After=network-online.target
[Service]
ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kubelet
ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet
# wait for kubernetes master to be up and ready
ExecStartPre=/opt/bin/wupiao <master-private-ip> 8080
ExecStart=/opt/bin/kubelet \
--api-servers=<master-private-ip>:8080 \
--hostname-override=$private_ipv4
Restart=always
RestartSec=10
- name: kube-proxy.service
command: start
content: |
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=network-online.target
After=network-online.target
[Service]
ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-proxy
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
# wait for kubernetes master to be up and ready
ExecStartPre=/opt/bin/wupiao <master-private-ip> 8080
ExecStart=/opt/bin/kube-proxy \
--master=http://<master-private-ip>:8080
Restart=always
RestartSec=10
update:
group: alpha
reboot-strategy: off

View File

@ -0,0 +1,421 @@
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Kubernetes 0.17.0 on EC2 powered by CoreOS 681.0.0 (alpha)",
"Mappings": {
"RegionMap": {
"eu-central-1" : {
"AMI" : "ami-4c4f7151"
},
"ap-northeast-1" : {
"AMI" : "ami-3a35fd3a"
},
"us-gov-west-1" : {
"AMI" : "ami-57117174"
},
"sa-east-1" : {
"AMI" : "ami-fbcc4ae6"
},
"ap-southeast-2" : {
"AMI" : "ami-593c4263"
},
"ap-southeast-1" : {
"AMI" : "ami-3a083668"
},
"us-east-1" : {
"AMI" : "ami-40322028"
},
"us-west-2" : {
"AMI" : "ami-23b58613"
},
"us-west-1" : {
"AMI" : "ami-15618f51"
},
"eu-west-1" : {
"AMI" : "ami-8d1164fa"
}
}
},
"Parameters": {
"InstanceType": {
"Description": "EC2 HVM instance type (m3.medium, etc).",
"Type": "String",
"Default": "m3.medium",
"AllowedValues": [
"m3.medium",
"m3.large",
"m3.xlarge",
"m3.2xlarge",
"c3.large",
"c3.xlarge",
"c3.2xlarge",
"c3.4xlarge",
"c3.8xlarge",
"cc2.8xlarge",
"cr1.8xlarge",
"hi1.4xlarge",
"hs1.8xlarge",
"i2.xlarge",
"i2.2xlarge",
"i2.4xlarge",
"i2.8xlarge",
"r3.large",
"r3.xlarge",
"r3.2xlarge",
"r3.4xlarge",
"r3.8xlarge",
"t2.micro",
"t2.small",
"t2.medium"
],
"ConstraintDescription": "Must be a valid EC2 HVM instance type."
},
"ClusterSize": {
"Description": "Number of nodes in cluster (2-12).",
"Default": "2",
"MinValue": "2",
"MaxValue": "12",
"Type": "Number"
},
"AllowSSHFrom": {
"Description": "The net block (CIDR) that SSH is available to.",
"Default": "0.0.0.0/0",
"Type": "String"
},
"KeyPair": {
"Description": "The name of an EC2 Key Pair to allow SSH access to the instance.",
"Type": "AWS::EC2::KeyPair::KeyName"
},
"VpcId": {
"Description": "The ID of the VPC to launch into.",
"Type": "AWS::EC2::VPC::Id"
},
"SubnetId": {
"Description": "The ID of the subnet to launch into (that must be within the supplied VPC)",
"Type": "AWS::EC2::Subnet::Id"
},
"SubnetAZ": {
"Description": "The availability zone of the subnet supplied (for example eu-west-1a)",
"Type": "String"
}
},
"Conditions": {
"UseEC2Classic": {"Fn::Equals": [{"Ref": "VpcId"}, ""]}
},
"Resources": {
"KubernetesSecurityGroup": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"VpcId": {"Fn::If": ["UseEC2Classic", {"Ref": "AWS::NoValue"}, {"Ref": "VpcId"}]},
"GroupDescription": "Kubernetes SecurityGroup",
"SecurityGroupIngress": [
{
"IpProtocol": "tcp",
"FromPort": "22",
"ToPort": "22",
"CidrIp": {"Ref": "AllowSSHFrom"}
}
]
}
},
"KubernetesIngress": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {"Fn::GetAtt": ["KubernetesSecurityGroup", "GroupId"]},
"IpProtocol": "tcp",
"FromPort": "1",
"ToPort": "65535",
"SourceSecurityGroupId": {
"Fn::GetAtt" : [ "KubernetesSecurityGroup", "GroupId" ]
}
}
},
"KubernetesIngressUDP": {
"Type": "AWS::EC2::SecurityGroupIngress",
"Properties": {
"GroupId": {"Fn::GetAtt": ["KubernetesSecurityGroup", "GroupId"]},
"IpProtocol": "udp",
"FromPort": "1",
"ToPort": "65535",
"SourceSecurityGroupId": {
"Fn::GetAtt" : [ "KubernetesSecurityGroup", "GroupId" ]
}
}
},
"KubernetesMasterInstance": {
"Type": "AWS::EC2::Instance",
"Properties": {
"NetworkInterfaces" : [{
"GroupSet" : [{"Fn::GetAtt": ["KubernetesSecurityGroup", "GroupId"]}],
"AssociatePublicIpAddress" : "true",
"DeviceIndex" : "0",
"DeleteOnTermination" : "true",
"SubnetId" : {"Fn::If": ["UseEC2Classic", {"Ref": "AWS::NoValue"}, {"Ref": "SubnetId"}]}
}],
"ImageId": {"Fn::FindInMap" : ["RegionMap", {"Ref": "AWS::Region" }, "AMI"]},
"InstanceType": {"Ref": "InstanceType"},
"KeyName": {"Ref": "KeyPair"},
"Tags" : [
{"Key" : "Name", "Value" : {"Fn::Join" : [ "-", [ {"Ref" : "AWS::StackName"}, "k8s-master" ] ]}},
{"Key" : "KubernetesRole", "Value" : "node"}
],
"UserData": { "Fn::Base64": {"Fn::Join" : ["", [
"#cloud-config\n\n",
"write_files:\n",
"- path: /opt/bin/waiter.sh\n",
" owner: root\n",
" content: |\n",
" #! /usr/bin/bash\n",
" until curl http://127.0.0.1:2379/v2/machines; do sleep 2; done\n",
"coreos:\n",
" etcd2:\n",
" name: master\n",
" initial-cluster-token: k8s_etcd\n",
" initial-cluster: master=http://$private_ipv4:2380\n",
" listen-peer-urls: http://$private_ipv4:2380,http://localhost:2380\n",
" initial-advertise-peer-urls: http://$private_ipv4:2380\n",
" listen-client-urls: http://$private_ipv4:2379,http://localhost:2379\n",
" advertise-client-urls: http://$private_ipv4:2379\n",
" fleet:\n",
" etcd_servers: http://localhost:2379\n",
" metadata: k8srole=master\n",
" flannel:\n",
" etcd_endpoints: http://localhost:2379\n",
" locksmithd:\n",
" endpoint: http://localhost:2379\n",
" units:\n",
" - name: etcd2.service\n",
" command: start\n",
" - name: fleet.service\n",
" command: start\n",
" - name: etcd2-waiter.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=etcd waiter\n",
" Wants=network-online.target\n",
" Wants=etcd2.service\n",
" After=etcd2.service\n",
" After=network-online.target\n",
" Before=flanneld.service fleet.service locksmithd.service\n\n",
" [Service]\n",
" ExecStart=/usr/bin/bash /opt/bin/waiter.sh\n",
" RemainAfterExit=true\n",
" Type=oneshot\n",
" - name: flanneld.service\n",
" command: start\n",
" drop-ins:\n",
" - name: 50-network-config.conf\n",
" content: |\n",
" [Service]\n",
" ExecStartPre=-/usr/bin/etcdctl mk /coreos.com/network/config '{\"Network\": \"10.244.0.0/16\", \"Backend\": {\"Type\": \"vxlan\"}}'\n",
" - name: docker-cache.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Docker cache proxy\n",
" Requires=early-docker.service\n",
" After=early-docker.service\n",
" Before=early-docker.target\n\n",
" [Service]\n",
" Restart=always\n",
" TimeoutStartSec=0\n",
" RestartSec=5\n",
" Environment=TMPDIR=/var/tmp/\n",
" Environment=DOCKER_HOST=unix:///var/run/early-docker.sock\n",
" ExecStartPre=-/usr/bin/docker kill docker-registry\n",
" ExecStartPre=-/usr/bin/docker rm docker-registry\n",
" ExecStartPre=/usr/bin/docker pull quay.io/devops/docker-registry:latest\n",
" # GUNICORN_OPTS is an workaround for\n",
" # https://github.com/docker/docker-registry/issues/892\n",
" ExecStart=/usr/bin/docker run --rm --net host --name docker-registry \\\n",
" -e STANDALONE=false \\\n",
" -e GUNICORN_OPTS=[--preload] \\\n",
" -e MIRROR_SOURCE=https://registry-1.docker.io \\\n",
" -e MIRROR_SOURCE_INDEX=https://index.docker.io \\\n",
" -e MIRROR_TAGS_CACHE_TTL=1800 \\\n",
" quay.io/devops/docker-registry:latest\n",
" - name: get-kubectl.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Get kubectl client tool\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=network-online.target\n",
" After=network-online.target\n\n",
" [Service]\n",
" ExecStart=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kubectl\n",
" ExecStart=/usr/bin/chmod +x /opt/bin/kubectl\n",
" Type=oneshot\n",
" RemainAfterExit=true\n",
" - name: kube-apiserver.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes API Server\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=etcd2-waiter.service\n",
" After=etcd2-waiter.service\n\n",
" [Service]\n",
" ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-apiserver\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-apiserver\n",
" ExecStart=/opt/bin/kube-apiserver \\\n",
" --insecure-bind-address=0.0.0.0 \\\n",
" --service-cluster-ip-range=10.100.0.0/16 \\\n",
" --etcd-servers=http://localhost:2379\n",
" Restart=always\n",
" RestartSec=10\n",
" - name: kube-controller-manager.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes Controller Manager\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=kube-apiserver.service\n",
" After=kube-apiserver.service\n\n",
" [Service]\n",
" ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-controller-manager\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-controller-manager\n",
" ExecStart=/opt/bin/kube-controller-manager \\\n",
" --master=127.0.0.1:8080\n",
" Restart=always\n",
" RestartSec=10\n",
" - name: kube-scheduler.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes Scheduler\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=kube-apiserver.service\n",
" After=kube-apiserver.service\n\n",
" [Service]\n",
" ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-scheduler\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-scheduler\n",
" ExecStart=/opt/bin/kube-scheduler \\\n",
" --master=127.0.0.1:8080\n",
" Restart=always\n",
" RestartSec=10\n",
" - name: kube-register.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes Registration Service\n",
" Documentation=https://github.com/kelseyhightower/kube-register\n",
" Requires=kube-apiserver.service fleet.service\n",
" After=kube-apiserver.service fleet.service\n\n",
" [Service]\n",
" ExecStartPre=-/usr/bin/wget -nc -O /opt/bin/kube-register https://github.com/kelseyhightower/kube-register/releases/download/v0.0.3/kube-register-0.0.3-linux-amd64\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-register\n",
" ExecStart=/opt/bin/kube-register \\\n",
" --metadata=k8srole=node \\\n",
" --fleet-endpoint=unix:///var/run/fleet.sock \\\n",
" --api-endpoint=http://127.0.0.1:8080\n",
" Restart=always\n",
" RestartSec=10\n",
" update:\n",
" group: alpha\n",
" reboot-strategy: off\n"
]]}
}
}
},
"KubernetesNodeLaunchConfig": {
"Type": "AWS::AutoScaling::LaunchConfiguration",
"Properties": {
"ImageId": {"Fn::FindInMap" : ["RegionMap", {"Ref": "AWS::Region" }, "AMI" ]},
"InstanceType": {"Ref": "InstanceType"},
"KeyName": {"Ref": "KeyPair"},
"AssociatePublicIpAddress" : "true",
"SecurityGroups": [{"Fn::If": [
"UseEC2Classic",
{"Ref": "KubernetesSecurityGroup"},
{"Fn::GetAtt": ["KubernetesSecurityGroup", "GroupId"]}]
}],
"UserData": { "Fn::Base64": {"Fn::Join" : ["", [
"#cloud-config\n\n",
"coreos:\n",
" etcd2:\n",
" listen-client-urls: http://localhost:2379\n",
" initial-cluster: master=http://", {"Fn::GetAtt" :["KubernetesMasterInstance" , "PrivateIp"]}, ":2380\n",
" proxy: on\n",
" fleet:\n",
" etcd_servers: http://localhost:2379\n",
" metadata: k8srole=node\n",
" flannel:\n",
" etcd_endpoints: http://localhost:2379\n",
" locksmithd:\n",
" endpoint: http://localhost:2379\n",
" units:\n",
" - name: etcd2.service\n",
" command: start\n",
" - name: fleet.service\n",
" command: start\n",
" - name: flanneld.service\n",
" command: start\n",
" - name: docker.service\n",
" command: start\n",
" drop-ins:\n",
" - name: 50-docker-mirror.conf\n",
" content: |\n",
" [Service]\n",
" Environment=DOCKER_OPTS='--registry-mirror=http://", {"Fn::GetAtt" :["KubernetesMasterInstance" , "PrivateIp"]}, ":5000'\n",
" - name: kubelet.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes Kubelet\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=network-online.target\n",
" After=network-online.target\n\n",
" [Service]\n",
" ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kubelet\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet\n",
" ExecStart=/opt/bin/kubelet \\\n",
" --api-servers=", {"Fn::GetAtt" :["KubernetesMasterInstance" , "PrivateIp"]}, ":8080 \\\n",
" --hostname-override=$private_ipv4\n",
" Restart=always\n",
" RestartSec=10\n",
" - name: kube-proxy.service\n",
" command: start\n",
" content: |\n",
" [Unit]\n",
" Description=Kubernetes Proxy\n",
" Documentation=https://github.com/GoogleCloudPlatform/kubernetes\n",
" Requires=network-online.target\n",
" After=network-online.target\n\n",
" [Service]\n",
" ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kube-proxy\n",
" ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy\n",
" ExecStart=/opt/bin/kube-proxy \\\n",
" --master=http://", {"Fn::GetAtt" :["KubernetesMasterInstance" , "PrivateIp"]}, ":8080\n",
" Restart=always\n",
" RestartSec=10\n",
" update:\n",
" group: alpha\n",
" reboot-strategy: off\n"
]]}
}
}
},
"KubernetesAutoScalingGroup": {
"Type": "AWS::AutoScaling::AutoScalingGroup",
"Properties": {
"AvailabilityZones": {"Fn::If": ["UseEC2Classic", {"Fn::GetAZs": ""}, [{"Ref": "SubnetAZ"}]]},
"VPCZoneIdentifier": {"Fn::If": ["UseEC2Classic", {"Ref": "AWS::NoValue"}, [{"Ref": "SubnetId"}]]},
"LaunchConfigurationName": {"Ref": "KubernetesNodeLaunchConfig"},
"MinSize": "2",
"MaxSize": "12",
"DesiredCapacity": {"Ref": "ClusterSize"},
"Tags" : [
{"Key" : "Name", "Value" : {"Fn::Join" : [ "-", [ {"Ref" : "AWS::StackName"}, "k8s-node" ] ]}, "PropagateAtLaunch" : true},
{"Key" : "KubernetesRole", "Value" : "node", "PropagateAtLaunch" : true}
]
}
}
},
"Outputs": {
"KubernetesMasterPublicIp": {
"Description": "Public Ip of the newly created Kubernetes Master instance",
"Value": {"Fn::GetAtt": ["KubernetesMasterInstance" , "PublicIp"]}
}
}
}

View File

@ -0,0 +1,27 @@
# Install and configure kubectl
## Download the kubectl CLI tool
```bash
### Darwin
wget https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/darwin/amd64/kubectl
### Linux
wget https://storage.googleapis.com/kubernetes-release/release/v0.17.0/bin/linux/amd64/kubectl
```
### Copy kubectl to your path
```bash
chmod +x kubectl
mv kubectl /usr/local/bin/
```
### Create a secure tunnel for API communication
```bash
ssh -f -nNT -L 8080:127.0.0.1:8080 core@<master-public-ip>
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/aws/kubectl.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/aws/kubectl.md?pixel)]()

View File

@ -0,0 +1,54 @@
## Getting started on Microsoft Azure
### Azure Prerequisites
1. You need an Azure account. Visit http://azure.microsoft.com/ to get started.
2. Install and configure the Azure cross-platform command-line interface. http://azure.microsoft.com/en-us/documentation/articles/xplat-cli/
3. Make sure you have a default account set in the Azure cli, using `azure account set`
### Prerequisites for your workstation
1. Be running a Linux or Mac OS X.
2. Get or build a [binary release](binary_release.md)
3. If you want to build your own release, you need to have [Docker
installed](https://docs.docker.com/installation/). On Mac OS X you can use
[boot2docker](http://boot2docker.io/).
### Setup
The cluster setup scripts can setup Kubernetes for multiple targets. First modify `cluster/kube-env.sh` to specify azure:
KUBERNETES_PROVIDER="azure"
Next, specify an existing virtual network and subnet in `cluster/azure/config-default.sh`:
AZ_VNET=<vnet name>
AZ_SUBNET=<subnet name>
You can create a virtual network:
azure network vnet create <vnet name> --subnet=<subnet name> --location "West US" -v
Now you're ready.
You can then use the `cluster/kube-*.sh` scripts to manage your azure cluster, start with:
cluster/kube-up.sh
The script above will start (by default) a single master VM along with 4 worker VMs. You
can tweak some of these parameters by editing `cluster/azure/config-default.sh`.
### Getting started with your cluster
See [a simple nginx example](../../examples/simple-nginx.md) to try out your new cluster.
For more complete applications, please look in the [examples directory](../../examples).
### Tearing down the cluster
```
cluster/kube-down.sh
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/azure.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/azure.md?pixel)]()

View File

@ -0,0 +1,29 @@
## Getting a Binary Release
You can either build a release from sources or download a pre-built release. If you don't plan on developing Kubernetes itself, we suggest a pre-built release.
### Prebuilt Binary Release
The list of binary releases is available for download from the [GitHub Kubernetes repo release page](https://github.com/GoogleCloudPlatform/kubernetes/releases).
Download the latest release and unpack this tar file on Linux or OS X, cd to the created `kubernetes/` directory, and then follow the getting started guide for your cloud.
### Building from source
Get the Kubernetes source. If you are simply building a release from source there is no need to set up a full golang environment as all building happens in a Docker container.
Building a release is simple.
```bash
git clone https://github.com/GoogleCloudPlatform/kubernetes.git
cd kubernetes
make release
```
For more details on the release process see the [`build/` directory](../../build)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/binary_release.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/binary_release.md?pixel)]()

View File

@ -0,0 +1,170 @@
##Getting started on [CentOS](http://centos.org)
This is a getting started guide for CentOS. It is a manual configuration so you understand all the underlying packages / services / ports, etc...
This guide will only get ONE minion working. Multiple minions requires a functional [networking configuration](http://docs.k8s.io/networking.md) done outside of kubernetes. Although the additional kubernetes configuration requirements should be obvious.
The kubernetes package provides a few services: kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, kube-proxy. These services are managed by systemd and the configuration resides in a central location: /etc/kubernetes. We will break the services up between the hosts. The first host, centos-master, will be the kubernetes master. This host will run the kube-apiserver, kube-controller-manager, and kube-scheduler. In addition, the master will also run _etcd_. The remaining host, centos-minion will be the minion and run kubelet, proxy, cadvisor and docker.
**System Information:**
Hosts:
```
centos-master = 192.168.121.9
centos-minion = 192.168.121.65
```
**Prepare the hosts:**
* Create virt7-testing repo on all hosts - centos-{master,minion} with following information.
```
[virt7-testing]
name=virt7-testing
baseurl=http://cbs.centos.org/repos/virt7-testing/x86_64/os/
gpgcheck=0
```
* Install kubernetes on all hosts - centos-{master,minion}. This will also pull in etcd, docker, and cadvisor.
```
yum -y install --enablerepo=virt7-testing kubernetes
```
* Note * Using etcd-0.4.6-7 (This is temperory update in documentation)
If you do not get etcd-0.4.6-7 installed with virt7-testing repo,
In the current virt7-testing repo, the etcd package is updated which causes service failure. To avoid this,
```
yum erase etcd
```
It will uninstall the current available etcd package
```
yum install http://cbs.centos.org/kojifiles/packages/etcd/0.4.6/7.el7.centos/x86_64/etcd-0.4.6-7.el7.centos.x86_64.rpm
yum -y install --enablerepo=virt7-testing kubernetes
```
* Add master and minion to /etc/hosts on all machines (not needed if hostnames already in DNS)
```
echo "192.168.121.9 centos-master
192.168.121.65 centos-minion" >> /etc/hosts
```
* Edit /etc/kubernetes/config which will be the same on all hosts to contain:
```
# Comma separated list of nodes in the etcd cluster
KUBE_ETCD_SERVERS="--etcd_servers=http://centos-master:4001"
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
# journal message level, 0 is debug
KUBE_LOG_LEVEL="--v=0"
# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow_privileged=false"
```
* Disable the firewall on both the master and minon, as docker does not play well with other firewall rule managers
```
systemctl disable iptables-services firewalld
systemctl stop iptables-services firewalld
```
**Configure the kubernetes services on the master.**
* Edit /etc/kubernetes/apiserver to appear as such:
```
# The address on the local server to listen to.
KUBE_API_ADDRESS="--address=0.0.0.0"
# The port on the local server to listen on.
KUBE_API_PORT="--port=8080"
# How the replication controller and scheduler find the kube-apiserver
KUBE_MASTER="--master=http://centos-master:8080"
# Port minions listen on
KUBELET_PORT="--kubelet_port=10250"
# Address range to use for services
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
# Add your own!
KUBE_API_ARGS=""
```
* Edit /etc/kubernetes/controller-manager to appear as such:
```
# Comma separated list of minions
KUBELET_ADDRESSES="--machines=centos-minion"
```
* Start the appropriate services on master:
```
for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler; do
systemctl restart $SERVICES
systemctl enable $SERVICES
systemctl status $SERVICES
done
```
**Configure the kubernetes services on the minion.**
***We need to configure the kubelet and start the kubelet and proxy***
* Edit /etc/kubernetes/kubelet to appear as such:
```
# The address for the info server to serve on
KUBELET_ADDRESS="--address=0.0.0.0"
# The port for the info server to serve on
KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname_override=centos-minion"
# Add your own!
KUBELET_ARGS=""
```
* Start the appropriate services on minion (centos-minion).
```
for SERVICES in kube-proxy kubelet docker; do
systemctl restart $SERVICES
systemctl enable $SERVICES
systemctl status $SERVICES
done
```
*You should be finished!*
* Check to make sure the cluster can see the minion (on centos-master)
```
kubectl get minions
NAME LABELS STATUS
centos-minion <none> Ready
```
**The cluster should be running! Launch a test pod.**
You should have a functional cluster, check out [101](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/examples/walkthrough/README.md)!
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/centos/centos_manual_config.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/centos/centos_manual_config.md?pixel)]()

View File

@ -0,0 +1,96 @@
## Deploying Kubernetes on [CloudStack](http://cloudstack.apache.org)
CloudStack is a software to build public and private clouds based on hardware virtualization principles (traditional IaaS). To deploy Kubernetes on CloudStack there are several possibilities depending on the Cloud being used and what images are made available. [Exoscale](http://exoscale.ch) for instance makes a [CoreOS](http://coreos.com) template available, therefore instructions to deploy Kubernetes on coreOS can be used. CloudStack also has a vagrant plugin available, hence Vagrant could be used to deploy Kubernetes either using the existing shell provisioner or using new Salt based recipes.
[CoreOS](http://coreos.com) templates for CloudStack are built [nightly](http://stable.release.core-os.net/amd64-usr/current/). CloudStack operators need to [register](http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/templates.html) this template in their cloud before proceeding with these Kubernetes deployment instructions.
There are currently two deployment techniques.
* [Kubernetes on Exoscale](https://github.com/runseb/kubernetes-exoscale).
This uses [libcloud](http://libcloud.apache.org) to launch CoreOS instances and pass the appropriate cloud-config setup using userdata. Several manual steps are required. This is obsoleted by the Ansible playbook detailed below.
* [Ansible playbook](https://github.com/runseb/ansible-kubernetes).
This is completely automated, a single playbook deploys Kubernetes based on the coreOS [instructions](http://docs.k8s.io/getting-started-guides/coreos/coreos_multinode_cluster.md).
#Ansible playbook
This [Ansible](http://ansibleworks.com) playbook deploys Kubernetes on a CloudStack based Cloud using CoreOS images. The playbook, creates an ssh key pair, creates a security group and associated rules and finally starts coreOS instances configured via cloud-init.
Prerequisites
-------------
$ sudo apt-get install -y python-pip
$ sudo pip install ansible
$ sudo pip install cs
[_cs_](http://github.com/exoscale/cs) is a python module for the CloudStack API.
Set your CloudStack endpoint, API keys and HTTP method used.
You can define them as environment variables: `CLOUDSTACK_ENDPOINT`, `CLOUDSTACK_KEY`, `CLOUDSTACK_SECRET` and `CLOUDSTACK_METHOD`.
Or create a `~/.cloudstack.ini` file:
[cloudstack]
endpoint = <your cloudstack api endpoint>
key = <your api access key>
secret = <your api secret key>
method = post
We need to use the http POST method to pass the _large_ userdata to the coreOS instances.
Clone the playbook
------------------
$ git clone --recursive https://github.com/runseb/ansible-kubernetes.git
$ cd ansible-kubernetes
The [ansible-cloudstack](https://github.com/resmo/ansible-cloudstack) module is setup in this repository as a submodule, hence the `--recursive`.
Create a Kubernetes cluster
---------------------------
You simply need to run the playbook.
$ ansible-playbook k8s.yml
Some variables can be edited in the `k8s.yml` file.
vars:
ssh_key: k8s
k8s_num_nodes: 2
k8s_security_group_name: k8s
k8s_node_prefix: k8s2
k8s_template: Linux CoreOS alpha 435 64-bit 10GB Disk
k8s_instance_type: Tiny
This will start a Kubernetes master node and a number of compute nodes (by default 2).
The `instance_type` and `template` by default are specific to [exoscale](http://exoscale.ch), edit them to specify your CloudStack cloud specific template and instance type (i.e service offering).
Check the tasks and templates in `roles/k8s` if you want to modify anything.
Once the playbook as finished, it will print out the IP of the Kubernetes master:
TASK: [k8s | debug msg='k8s master IP is {{ k8s_master.default_ip }}'] ********
SSH to it using the key that was created and using the _core_ user and you can list the machines in your cluster:
$ ssh -i ~/.ssh/id_rsa_k8s core@<maste IP>
$ fleetctl list-machines
MACHINE IP METADATA
a017c422... <node #1 IP> role=node
ad13bf84... <master IP> role=master
e9af8293... <node #2 IP> role=node
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/cloudstack.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/cloudstack.md?pixel)]()

View File

@ -0,0 +1,18 @@
## Getting started on [CoreOS](http://coreos.com)
There are multiple guides on running Kubernetes with [CoreOS](http://coreos.com):
* [Single Node Cluster](coreos/coreos_single_node_cluster.md)
* [Multi-node Cluster](coreos/coreos_multinode_cluster.md)
* [Setup Multi-node Cluster on GCE in an easy way](https://github.com/rimusz/coreos-multi-node-k8s-gce/blob/master/README.md)
* [Multi-node cluster using cloud-config and Weave on Vagrant](https://github.com/errordeveloper/weave-demos/blob/master/poseidon/README.md)
* [Multi-node cluster using cloud-config and Vagrant](https://github.com/pires/kubernetes-vagrant-coreos-cluster/blob/master/README.md)
* [Yet another multi-node cluster using cloud-config and Vagrant](https://github.com/AntonioMeireles/kubernetes-vagrant-coreos-cluster/blob/master/README.md) (similar to the one above but with an increased, more *aggressive* focus on features and flexibility)
* [Multi-node cluster with Vagrant and fleet units using a small OS X App](https://github.com/rimusz/coreos-osx-gui-kubernetes-cluster/blob/master/README.md)
* [Resizable multi-node cluster on Azure with Weave](coreos/azure/README.md)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/coreos.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/coreos.md?pixel)]()

View File

@ -0,0 +1 @@
node_modules/

View File

@ -0,0 +1,195 @@
# Kubernetes on Azure with CoreOS and [Weave](http://weave.works)
## Introduction
In this guide I will demonstrate how to deploy a Kubernetes cluster to Azure cloud. You will be using CoreOS with Weave, which implements simple and secure networking, in a transparent, yet robust way. The purpose of this guide is to provide an out-of-the-box implementation that can ultimately be taken into production with little change. It will demonstrate how to provision a dedicated Kubernetes master and etcd nodes, and show how to scale the cluster with ease.
## Let's go!
To get started, you need to checkout the code:
```
git clone https://github.com/GoogleCloudPlatform/kubernetes
cd kubernetes/docs/getting-started-guides/coreos/azure/
```
You will need to have [Node.js installed](http://nodejs.org/download/) on you machine. If you have previously used Azure CLI, you should have it already.
First, you need to install some of the dependencies with
```
npm install
```
Now, all you need to do is:
```
./azure-login.js -u <your_username>
./create-kubernetes-cluster.js
```
This script will provision a cluster suitable for production use, where there is a ring of 3 dedicated etcd nodes, Kubernetes master and 2 nodes. The `kube-00` VM will be the master, your work loads are only to be deployed on the minion nodes, `kube-01` and `kube-02`. Initially, all VMs are single-core, to ensure a user of the free tier can reproduce it without paying extra. I will show how to add more bigger VMs later.
![VMs in Azure](initial_cluster.png)
Once the creation of Azure VMs has finished, you should see the following:
```
...
azure_wrapper/info: Saved SSH config, you can use it like so: `ssh -F ./output/kube_1c1496016083b4_ssh_conf <hostname>`
azure_wrapper/info: The hosts in this deployment are:
[ 'etcd-00', 'etcd-01', 'etcd-02', 'kube-00', 'kube-01', 'kube-02' ]
azure_wrapper/info: Saved state into `./output/kube_1c1496016083b4_deployment.yml`
```
Let's login to the master node like so:
```
ssh -F ./output/kube_1c1496016083b4_ssh_conf kube-00
```
> Note: config file name will be different, make sure to use the one you see.
Check there are 2 nodes in the cluster:
```
core@kube-00 ~ $ kubectl get nodes
NAME LABELS STATUS
kube-01 environment=production Ready
kube-02 environment=production Ready
```
## Deploying the workload
Let's follow the Guestbook example now:
```
cd guestbook-example
kubectl create -f redis-master-controller.json
kubectl create -f redis-master-service.json
kubectl create -f redis-slave-controller.json
kubectl create -f redis-slave-service.json
kubectl create -f frontend-controller.json
kubectl create -f frontend-service.json
```
You need to wait for the pods to get deployed, run the following and wait for `STATUS` to change from `Unknown`, through `Pending` to `Running`.
```
kubectl get pods --watch
```
> Note: the most time it will spend downloading Docker container images on each of the nodes.
Eventually you should see:
```
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
frontend-controller-0133o 10.2.1.14 php-redis kubernetes/example-guestbook-php-redis kube-01/172.18.0.13 name=frontend,uses=redisslave,redis-master Running
frontend-controller-ls6k1 10.2.3.10 php-redis kubernetes/example-guestbook-php-redis <unassigned> name=frontend,uses=redisslave,redis-master Running
frontend-controller-oh43e 10.2.2.15 php-redis kubernetes/example-guestbook-php-redis kube-02/172.18.0.14 name=frontend,uses=redisslave,redis-master Running
redis-master 10.2.1.3 master redis kube-01/172.18.0.13 name=redis-master Running
redis-slave-controller-fplln 10.2.2.3 slave brendanburns/redis-slave kube-02/172.18.0.14 name=redisslave,uses=redis-master Running
redis-slave-controller-gziey 10.2.1.4 slave brendanburns/redis-slave kube-01/172.18.0.13 name=redisslave,uses=redis-master Running
```
## Scaling
Two single-core nodes are certainly not enough for a production system of today, and, as you can see, there is one _unassigned_ pod. Let's scale the cluster by adding a couple of bigger nodes.
You will need to open another terminal window on your machine and go to the same working directory (e.g. `~/Workspace/weave-demos/coreos-azure`).
First, lets set the size of new VMs:
```
export AZ_VM_SIZE=Large
```
Now, run scale script with state file of the previous deployment and number of nodes to add:
```
./scale-kubernetes-cluster.js ./output/kube_1c1496016083b4_deployment.yml 2
...
azure_wrapper/info: Saved SSH config, you can use it like so: `ssh -F ./output/kube_8f984af944f572_ssh_conf <hostname>`
azure_wrapper/info: The hosts in this deployment are:
[ 'etcd-00',
'etcd-01',
'etcd-02',
'kube-00',
'kube-01',
'kube-02',
'kube-03',
'kube-04' ]
azure_wrapper/info: Saved state into `./output/kube_8f984af944f572_deployment.yml`
```
> Note: this step has created new files in `./output`.
Back on `kube-00`:
```
core@kube-00 ~ $ kubectl get nodes
NAME LABELS STATUS
kube-01 environment=production Ready
kube-02 environment=production Ready
kube-03 environment=production Ready
kube-04 environment=production Ready
```
You can see that two more nodes joined happily. Let's scale the number of Guestbook instances now.
First, double-check how many replication controllers there are:
```
core@kube-00 ~ $ kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
frontend php-redis kubernetes/example-guestbook-php-redis:v2 name=frontend 3
redis-master master redis name=redis-master 1
redis-slave slave kubernetes/redis-slave:v2 name=redis-slave 2
```
As there are 4 nodes, let's scale proportionally:
```
core@kube-00 ~ $ kubectl scale --replicas=4 rc redis-slave
scaled
core@kube-00 ~ $ kubectl scale --replicas=4 rc frontend
scaled
```
Check what you have now:
```
core@kube-00 ~ $ kubectl get rc
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
frontend php-redis kubernetes/example-guestbook-php-redis:v2 name=frontend 4
redis-master master redis name=redis-master 1
redis-slave slave kubernetes/redis-slave:v2 name=redis-slave 4
```
You now will have more instances of front-end Guestbook apps and Redis slaves; and, if you look up all pods labeled `name=frontend`, you should see one running on each node.
```
core@kube-00 ~/guestbook-example $ kubectl get pods -l name=frontend
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS
frontend-controller-0133o 10.2.1.19 php-redis kubernetes/example-guestbook-php-redis kube-01/172.18.0.13 name=frontend,uses=redisslave,redis-master Running
frontend-controller-i7hvs 10.2.4.5 php-redis kubernetes/example-guestbook-php-redis kube-04/172.18.0.21 name=frontend,uses=redisslave,redis-master Running
frontend-controller-ls6k1 10.2.3.18 php-redis kubernetes/example-guestbook-php-redis kube-03/172.18.0.20 name=frontend,uses=redisslave,redis-master Running
frontend-controller-oh43e 10.2.2.22 php-redis kubernetes/example-guestbook-php-redis kube-02/172.18.0.14 name=frontend,uses=redisslave,redis-master Running
```
## Exposing the app to the outside world
To makes sure the app is working, you probably want to load it in the browser. For accessing the Guesbook service from the outside world, an Azure endpoint needs to be created like shown on the picture below.
![Creating an endpoint](external_access.png)
You then should be able to access it from anywhere via the Azure virtual IP for `kube-01`, i.e. `http://104.40.211.194:8000/` as per screenshot.
## Next steps
You now have a full-blow cluster running in Azure, congrats!
You should probably try deploy other [example apps](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples) or write your own ;)
## Tear down...
If you don't wish care about the Azure bill, you can tear down the cluster. It's easy to redeploy it, as you can see.
```
./destroy-cluster.js ./output/kube_8f984af944f572_deployment.yml
```
> Note: make sure to use the _latest state file_, as after scaling there is a new one.
By the way, with the scripts shown, you can deploy multiple clusters, if you like :)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/getting-started-guides/coreos/azure/README.md?pixel)]()
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/release-0.19.0/docs/getting-started-guides/coreos/azure/README.md?pixel)]()

View File

@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
labels:
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Grafana"
name: monitoring-grafana
spec:
ports:
- port: 80
targetPort: 8080
selector:
name: influxGrafana

View File

@ -0,0 +1,24 @@
apiVersion: v1
kind: ReplicationController
metadata:
labels:
name: heapster
kubernetes.io/cluster-service: "true"
name: monitoring-heapster-controller
spec:
replicas: 1
selector:
name: heapster
template:
metadata:
labels:
name: heapster
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: gcr.io/google_containers/heapster:v0.12.1
name: heapster
command:
- /heapster
- --source=kubernetes:http://kubernetes?auth=
- --sink=influxdb:http://monitoring-influxdb:8086

View File

@ -0,0 +1,35 @@
apiVersion: v1
kind: ReplicationController
metadata:
labels:
name: influxGrafana
kubernetes.io/cluster-service: "true"
name: monitoring-influx-grafana-controller
spec:
replicas: 1
selector:
name: influxGrafana
template:
metadata:
labels:
name: influxGrafana
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: gcr.io/google_containers/heapster_influxdb:v0.3
name: influxdb
ports:
- containerPort: 8083
hostPort: 8083
- containerPort: 8086
hostPort: 8086
- image: gcr.io/google_containers/heapster_grafana:v0.7
name: grafana
env:
- name: INFLUXDB_EXTERNAL_URL
value: /api/v1/proxy/namespaces/default/services/monitoring-grafana/db/
- name: INFLUXDB_HOST
value: monitoring-influxdb
- name: INFLUXDB_PORT
value: "8086"

View File

@ -0,0 +1,17 @@
apiVersion: v1
kind: Service
metadata:
labels:
name: influxGrafana
name: monitoring-influxdb
spec:
ports:
- name: http
port: 8083
targetPort: 8083
- name: api
port: 8086
targetPort: 8086
selector:
name: influxGrafana

View File

@ -0,0 +1,37 @@
apiVersion: v1
kind: ReplicationController
metadata:
name: elasticsearch-logging-v1
namespace: default
labels:
k8s-app: elasticsearch-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
replicas: 2
selector:
k8s-app: elasticsearch-logging
version: v1
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- image: gcr.io/google_containers/elasticsearch:1.3
name: elasticsearch-logging
ports:
- containerPort: 9200
name: es-port
protocol: TCP
- containerPort: 9300
name: es-transport-port
protocol: TCP
volumeMounts:
- name: es-persistent-storage
mountPath: /data
volumes:
- name: es-persistent-storage
emptyDir: {}

View File

@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: default
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Elasticsearch"
spec:
ports:
- port: 9200
protocol: TCP
targetPort: es-port
selector:
k8s-app: elasticsearch-logging

View File

@ -0,0 +1,31 @@
apiVersion: v1
kind: ReplicationController
metadata:
name: kibana-logging-v1
namespace: default
labels:
k8s-app: kibana-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kibana-logging
version: v1
template:
metadata:
labels:
k8s-app: kibana-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: kibana-logging
image: gcr.io/google_containers/kibana:1.3
env:
- name: "ELASTICSEARCH_URL"
value: "http://elasticsearch-logging:9200"
ports:
- containerPort: 5601
name: kibana-port
protocol: TCP

View File

@ -0,0 +1,17 @@
apiVersion: v1
kind: Service
metadata:
name: kibana-logging
namespace: default
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Kibana"
spec:
ports:
- port: 5601
protocol: TCP
targetPort: kibana-port
selector:
k8s-app: kibana-logging

View File

@ -0,0 +1,3 @@
#!/usr/bin/env node
require('child_process').fork('node_modules/azure-cli/bin/azure', ['login'].concat(process.argv));

View File

@ -0,0 +1,60 @@
## This file is used as input to deployment script, which ammends it as needed.
## More specifically, we need to add peer hosts for each but the elected peer.
write_files:
- path: /opt/bin/curl-retry.sh
permissions: '0755'
owner: root
content: |
#!/bin/sh -x
until curl $@
do sleep 1
done
coreos:
units:
- name: download-etcd2.service
enable: true
command: start
content: |
[Unit]
After=network-online.target
Before=etcd2.service
Description=Download etcd2 Binaries
Documentation=https://github.com/coreos/etcd/
Requires=network-online.target
[Service]
Environment=ETCD2_RELEASE_TARBALL=https://github.com/coreos/etcd/releases/download/v2.0.11/etcd-v2.0.11-linux-amd64.tar.gz
ExecStartPre=/bin/mkdir -p /opt/bin
ExecStart=/opt/bin/curl-retry.sh --silent --location $ETCD2_RELEASE_TARBALL --output /tmp/etcd2.tgz
ExecStart=/bin/tar xzvf /tmp/etcd2.tgz -C /opt
ExecStartPost=/bin/ln -s /opt/etcd-v2.0.11-linux-amd64/etcd /opt/bin/etcd2
ExecStartPost=/bin/ln -s /opt/etcd-v2.0.11-linux-amd64/etcdctl /opt/bin/etcdctl2
RemainAfterExit=yes
Type=oneshot
[Install]
WantedBy=multi-user.target
- name: etcd2.service
enable: true
command: start
content: |
[Unit]
After=download-etcd2.service
Description=etcd 2
Documentation=https://github.com/coreos/etcd/
[Service]
Environment=ETCD_NAME=%H
Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=http://%H:2380
Environment=ETCD_LISTEN_PEER_URLS=http://%H:2380
Environment=ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://0.0.0.0:4001
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://%H:2379,http://%H:4001
Environment=ETCD_INITIAL_CLUSTER_STATE=new
ExecStart=/opt/bin/etcd2
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
update:
group: stable
reboot-strategy: off

View File

@ -0,0 +1,388 @@
## This file is used as input to deployment script, which ammends it as needed.
## More specifically, we need to add environment files for as many nodes as we
## are going to deploy.
write_files:
- path: /opt/bin/curl-retry.sh
permissions: '0755'
owner: root
content: |
#!/bin/sh -x
until curl $@
do sleep 1
done
- path: /opt/bin/register_minion.sh
permissions: '0755'
owner: root
content: |
#!/bin/sh -xe
minion_id="${1}"
master_url="${2}"
env_label="${3}"
until healthcheck=$(curl --fail --silent "${master_url}/healthz")
do sleep 2
done
test -n "${healthcheck}"
test "${healthcheck}" = "ok"
printf '{
"id": "%s",
"kind": "Minion",
"apiVersion": "v1beta1",
"labels": { "environment": "%s" }
}' "${minion_id}" "${env_label}" \
| /opt/bin/kubectl create -s "${master_url}" -f -
- path: /etc/kubernetes/manifests/fluentd.manifest
permissions: '0755'
owner: root
content: |
apiVersion: v1
kind: Pod
metadata:
name: fluentd-elasticsearch
spec:
containers:
- name: fluentd-elasticsearch
image: gcr.io/google_containers/fluentd-elasticsearch:1.5
env:
- name: "FLUENTD_ARGS"
value: "-qq"
volumeMounts:
- name: varlog
mountPath: /varlog
- name: containers
mountPath: /var/lib/docker/containers
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
coreos:
update:
group: stable
reboot-strategy: off
units:
- name: systemd-networkd-wait-online.service
drop-ins:
- name: 50-check-github-is-reachable.conf
content: |
[Service]
ExecStart=/bin/sh -x -c \
'until curl --silent --fail https://status.github.com/api/status.json | grep -q \"good\"; do sleep 2; done'
- name: docker.service
drop-ins:
- name: 50-weave-kubernetes.conf
content: |
[Service]
Environment=DOCKER_OPTS='--bridge="weave" -r="false"'
- name: weave-network.target
enable: true
content: |
[Unit]
Description=Weave Network Setup Complete
Documentation=man:systemd.special(7)
RefuseManualStart=no
After=network-online.target
[Install]
WantedBy=multi-user.target
WantedBy=kubernetes-master.target
WantedBy=kubernetes-minion.target
- name: kubernetes-master.target
enable: true
command: start
content: |
[Unit]
Description=Kubernetes Cluster Master
Documentation=http://kubernetes.io/
RefuseManualStart=no
After=weave-network.target
Requires=weave-network.target
ConditionHost=kube-00
Wants=apiserver.service
Wants=scheduler.service
Wants=controller-manager.service
[Install]
WantedBy=multi-user.target
- name: kubernetes-minion.target
enable: true
command: start
content: |
[Unit]
Description=Kubernetes Cluster Minion
Documentation=http://kubernetes.io/
RefuseManualStart=no
After=weave-network.target
Requires=weave-network.target
ConditionHost=!kube-00
Wants=proxy.service
Wants=kubelet.service
[Install]
WantedBy=multi-user.target
- name: 10-weave.network
runtime: false
content: |
[Match]
Type=bridge
Name=weave*
[Network]
- name: install-weave.service
enable: true
content: |
[Unit]
After=network-online.target
Before=weave.service
Before=weave-helper.service
Before=docker.service
Description=Install Weave
Documentation=http://docs.weave.works/
Requires=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/bin/mkdir -p /opt/bin/
ExecStartPre=/opt/bin/curl-retry.sh \
--silent \
--location \
https://github.com/weaveworks/weave/releases/download/latest_release/weave \
--output /opt/bin/weave
ExecStartPre=/opt/bin/curl-retry.sh \
--silent \
--location \
https://raw.github.com/errordeveloper/weave-demos/master/poseidon/weave-helper \
--output /opt/bin/weave-helper
ExecStartPre=/usr/bin/chmod +x /opt/bin/weave
ExecStartPre=/usr/bin/chmod +x /opt/bin/weave-helper
ExecStart=/bin/echo Weave Installed
[Install]
WantedBy=weave-network.target
WantedBy=weave.service
- name: weave-helper.service
enable: true
content: |
[Unit]
After=install-weave.service
After=docker.service
Description=Weave Network Router
Documentation=http://docs.weave.works/
Requires=docker.service
Requires=install-weave.service
[Service]
ExecStart=/opt/bin/weave-helper
Restart=always
[Install]
WantedBy=weave-network.target
- name: weave.service
enable: true
content: |
[Unit]
After=install-weave.service
After=docker.service
Description=Weave Network Router
Documentation=http://docs.weave.works/
Requires=docker.service
Requires=install-weave.service
[Service]
TimeoutStartSec=0
EnvironmentFile=/etc/weave.%H.env
ExecStartPre=/opt/bin/weave setup
ExecStartPre=/opt/bin/weave launch $WEAVE_PEERS
ExecStart=/usr/bin/docker attach weave
Restart=on-failure
Restart=always
ExecStop=/opt/bin/weave stop
[Install]
WantedBy=weave-network.target
- name: weave-create-bridge.service
enable: true
content: |
[Unit]
After=network.target
After=install-weave.service
Before=weave.service
Before=docker.service
Requires=network.target
Requires=install-weave.service
[Service]
Type=oneshot
EnvironmentFile=/etc/weave.%H.env
ExecStart=/opt/bin/weave --local create-bridge
ExecStart=/usr/bin/ip addr add dev weave $BRIDGE_ADDRESS_CIDR
ExecStart=/usr/bin/ip route add $BREAKOUT_ROUTE dev weave scope link
ExecStart=/usr/bin/ip route add 224.0.0.0/4 dev weave
[Install]
WantedBy=multi-user.target
WantedBy=weave-network.target
- name: download-kubernetes.service
enable: true
content: |
[Unit]
After=network-online.target
Before=apiserver.service
Before=controller-manager.service
Before=kubelet.service
Before=proxy.service
Description=Download Kubernetes Binaries
Documentation=http://kubernetes.io/
Requires=network-online.target
[Service]
Environment=KUBE_RELEASE_TARBALL=https://github.com/GoogleCloudPlatform/kubernetes/releases/download/v0.18.0/kubernetes.tar.gz
ExecStartPre=/bin/mkdir -p /opt/
ExecStart=/opt/bin/curl-retry.sh --silent --location $KUBE_RELEASE_TARBALL --output /tmp/kubernetes.tgz
ExecStart=/bin/tar xzvf /tmp/kubernetes.tgz -C /tmp/
ExecStart=/bin/tar xzvf /tmp/kubernetes/server/kubernetes-server-linux-amd64.tar.gz -C /opt
ExecStartPost=/bin/chmod o+rx -R /opt/kubernetes
ExecStartPost=/bin/ln -s /opt/kubernetes/server/bin/kubectl /opt/bin/
ExecStartPost=/bin/mv /tmp/kubernetes/examples/guestbook /home/core/guestbook-example
ExecStartPost=/bin/chown core. -R /home/core/guestbook-example
ExecStartPost=/bin/rm -rf /tmp/kubernetes
ExecStartPost=/bin/sed 's/\("createExternalLoadBalancer":\) true/\1 false/' -i /home/core/guestbook-example/frontend-service.json
RemainAfterExit=yes
Type=oneshot
[Install]
WantedBy=kubernetes-master.target
WantedBy=kubernetes-minion.target
- name: apiserver.service
enable: true
content: |
[Unit]
After=download-kubernetes.service
Before=controller-manager.service
Before=scheduler.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-apiserver
Description=Kubernetes API Server
Documentation=http://kubernetes.io/
Wants=download-kubernetes.service
ConditionHost=kube-00
[Service]
ExecStart=/opt/kubernetes/server/bin/kube-apiserver \
--address=0.0.0.0 \
--port=8080 \
$ETCD_SERVERS \
--service-cluster-ip-range=10.1.0.0/16 \
--cloud_provider=vagrant \
--logtostderr=true --v=3
Restart=always
RestartSec=10
[Install]
WantedBy=kubernetes-master.target
- name: scheduler.service
enable: true
content: |
[Unit]
After=apiserver.service
After=download-kubernetes.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-scheduler
Description=Kubernetes Scheduler
Documentation=http://kubernetes.io/
Wants=apiserver.service
ConditionHost=kube-00
[Service]
ExecStart=/opt/kubernetes/server/bin/kube-scheduler \
--logtostderr=true \
--master=127.0.0.1:8080
Restart=always
RestartSec=10
[Install]
WantedBy=kubernetes-master.target
- name: controller-manager.service
enable: true
content: |
[Unit]
After=download-kubernetes.service
After=apiserver.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-controller-manager
Description=Kubernetes Controller Manager
Documentation=http://kubernetes.io/
Wants=apiserver.service
Wants=download-kubernetes.service
ConditionHost=kube-00
[Service]
ExecStart=/opt/kubernetes/server/bin/kube-controller-manager \
--cloud_provider=vagrant \
--master=127.0.0.1:8080 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=kubernetes-master.target
- name: kubelet.service
enable: true
content: |
[Unit]
After=download-kubernetes.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kubelet
Description=Kubernetes Kubelet
Documentation=http://kubernetes.io/
Wants=download-kubernetes.service
ConditionHost=!kube-00
[Service]
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests/
ExecStart=/opt/kubernetes/server/bin/kubelet \
--address=0.0.0.0 \
--port=10250 \
--hostname_override=%H \
--api_servers=http://kube-00:8080 \
--logtostderr=true \
--cluster_dns=10.1.0.3 \
--cluster_domain=kube.local \
--config=/etc/kubernetes/manifests/
Restart=always
RestartSec=10
[Install]
WantedBy=kubernetes-minion.target
- name: proxy.service
enable: true
content: |
[Unit]
After=download-kubernetes.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-proxy
Description=Kubernetes Proxy
Documentation=http://kubernetes.io/
Wants=download-kubernetes.service
ConditionHost=!kube-00
[Service]
ExecStart=/opt/kubernetes/server/bin/kube-proxy \
--master=http://kube-00:8080 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=kubernetes-minion.target
- name: kubectl-create-minion.service
enable: true
content: |
[Unit]
After=download-kubernetes.service
Before=proxy.service
Before=kubelet.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kubectl
ConditionFileIsExecutable=/opt/bin/register_minion.sh
Description=Kubernetes Create Minion
Documentation=http://kubernetes.io/
Wants=download-kubernetes.service
ConditionHost=!kube-00
[Service]
ExecStart=/opt/bin/register_minion.sh %H http://kube-00:8080 production
Type=oneshot
[Install]
WantedBy=kubernetes-minion.target

View File

@ -0,0 +1,15 @@
#!/usr/bin/env node
var azure = require('./lib/azure_wrapper.js');
var kube = require('./lib/deployment_logic/kubernetes.js');
azure.create_config('kube', { 'etcd': 3, 'kube': 3 });
azure.run_task_queue([
azure.queue_default_network(),
azure.queue_storage_if_needed(),
azure.queue_machines('etcd', 'stable',
kube.create_etcd_cloud_config),
azure.queue_machines('kube', 'stable',
kube.create_node_cloud_config),
]);

View File

@ -0,0 +1,7 @@
#!/usr/bin/env node
var azure = require('./lib/azure_wrapper.js');
azure.destroy_cluster(process.argv[2]);
console.log('The cluster had been destroyed, you can delete the state file now.');

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 169 KiB

View File

@ -0,0 +1,271 @@
var _ = require('underscore');
var fs = require('fs');
var cp = require('child_process');
var yaml = require('js-yaml');
var openssl = require('openssl-wrapper');
var clr = require('colors');
var inspect = require('util').inspect;
var util = require('./util.js');
var coreos_image_ids = {
'stable': '2b171e93f07c4903bcad35bda10acf22__CoreOS-Stable-647.2.0',
'beta': '2b171e93f07c4903bcad35bda10acf22__CoreOS-Beta-681.0.0', // untested
'alpha': '2b171e93f07c4903bcad35bda10acf22__CoreOS-Alpha-695.0.0' // untested
};
var conf = {};
var hosts = {
collection: [],
ssh_port_counter: 2200,
};
var task_queue = [];
exports.run_task_queue = function (dummy) {
var tasks = {
todo: task_queue,
done: [],
};
var pop_task = function() {
console.log(clr.yellow('azure_wrapper/task:'), clr.grey(inspect(tasks)));
var ret = {};
ret.current = tasks.todo.shift();
ret.remaining = tasks.todo.length;
return ret;
};
(function iter (task) {
if (task.current === undefined) {
if (conf.destroying === undefined) {
create_ssh_conf();
save_state();
}
return;
} else {
if (task.current.length !== 0) {
console.log(clr.yellow('azure_wrapper/exec:'), clr.blue(inspect(task.current)));
cp.fork('node_modules/azure-cli/bin/azure', task.current)
.on('exit', function (code, signal) {
tasks.done.push({
code: code,
signal: signal,
what: task.current.join(' '),
remaining: task.remaining,
});
if (code !== 0 && conf.destroying === undefined) {
console.log(clr.red('azure_wrapper/fail: Exiting due to an error.'));
save_state();
console.log(clr.cyan('azure_wrapper/info: You probably want to destroy and re-run.'));
process.abort();
} else {
iter(pop_task());
}
});
} else {
iter(pop_task());
}
}
})(pop_task());
};
var save_state = function () {
var file_name = util.join_output_file_path(conf.name, 'deployment.yml');
try {
conf.hosts = hosts.collection;
fs.writeFileSync(file_name, yaml.safeDump(conf));
console.log(clr.yellow('azure_wrapper/info: Saved state into `%s`'), file_name);
} catch (e) {
console.log(clr.red(e));
}
};
var load_state = function (file_name) {
try {
conf = yaml.safeLoad(fs.readFileSync(file_name, 'utf8'));
console.log(clr.yellow('azure_wrapper/info: Loaded state from `%s`'), file_name);
return conf;
} catch (e) {
console.log(clr.red(e));
}
};
var create_ssh_key = function (prefix) {
var opts = {
x509: true,
nodes: true,
newkey: 'rsa:2048',
subj: '/O=Weaveworks, Inc./L=London/C=GB/CN=weave.works',
keyout: util.join_output_file_path(prefix, 'ssh.key'),
out: util.join_output_file_path(prefix, 'ssh.pem'),
};
openssl.exec('req', opts, function (err, buffer) {
if (err) console.log(clr.red(err));
fs.chmod(opts.keyout, '0600', function (err) {
if (err) console.log(clr.red(err));
});
});
return {
key: opts.keyout,
pem: opts.out,
}
}
var create_ssh_conf = function () {
var file_name = util.join_output_file_path(conf.name, 'ssh_conf');
var ssh_conf_head = [
"Host *",
"\tHostname " + conf.resources['service'] + ".cloudapp.net",
"\tUser core",
"\tCompression yes",
"\tLogLevel FATAL",
"\tStrictHostKeyChecking no",
"\tUserKnownHostsFile /dev/null",
"\tIdentitiesOnly yes",
"\tIdentityFile " + conf.resources['ssh_key']['key'],
"\n",
];
fs.writeFileSync(file_name, ssh_conf_head.concat(_.map(hosts.collection, function (host) {
return _.template("Host <%= name %>\n\tPort <%= port %>\n")(host);
})).join('\n'));
console.log(clr.yellow('azure_wrapper/info:'), clr.green('Saved SSH config, you can use it like so: `ssh -F ', file_name, '<hostname>`'));
console.log(clr.yellow('azure_wrapper/info:'), clr.green('The hosts in this deployment are:\n'), _.map(hosts.collection, function (host) { return host.name; }));
};
var get_location = function () {
if (process.env['AZ_AFFINITY']) {
return '--affinity-group=' + process.env['AZ_AFFINITY'];
} else if (process.env['AZ_LOCATION']) {
return '--location=' + process.env['AZ_LOCATION'];
} else {
return '--location=West Europe';
}
}
var get_vm_size = function () {
if (process.env['AZ_VM_SIZE']) {
return '--vm-size=' + process.env['AZ_VM_SIZE'];
} else {
return '--vm-size=Small';
}
}
exports.queue_default_network = function () {
task_queue.push([
'network', 'vnet', 'create',
get_location(),
'--address-space=172.16.0.0',
conf.resources['vnet'],
]);
}
exports.queue_storage_if_needed = function() {
if (!process.env['AZURE_STORAGE_ACCOUNT']) {
conf.resources['storage_account'] = util.rand_suffix;
task_queue.push([
'storage', 'account', 'create',
'--type=LRS',
get_location(),
conf.resources['storage_account'],
]);
process.env['AZURE_STORAGE_ACCOUNT'] = conf.resources['storage_account'];
} else {
// Preserve it for resizing, so we don't create a new one by accedent,
// when the environment variable is unset
conf.resources['storage_account'] = process.env['AZURE_STORAGE_ACCOUNT'];
}
};
exports.queue_machines = function (name_prefix, coreos_update_channel, cloud_config_creator) {
var x = conf.nodes[name_prefix];
var vm_create_base_args = [
'vm', 'create',
get_location(),
get_vm_size(),
'--connect=' + conf.resources['service'],
'--virtual-network-name=' + conf.resources['vnet'],
'--no-ssh-password',
'--ssh-cert=' + conf.resources['ssh_key']['pem'],
];
var cloud_config = cloud_config_creator(x, conf);
var next_host = function (n) {
hosts.ssh_port_counter += 1;
var host = { name: util.hostname(n, name_prefix), port: hosts.ssh_port_counter };
if (cloud_config instanceof Array) {
host.cloud_config_file = cloud_config[n];
} else {
host.cloud_config_file = cloud_config;
}
hosts.collection.push(host);
return _.map([
"--vm-name=<%= name %>",
"--ssh=<%= port %>",
"--custom-data=<%= cloud_config_file %>",
], function (arg) { return _.template(arg)(host); });
};
task_queue = task_queue.concat(_(x).times(function (n) {
if (conf.resizing && n < conf.old_size) {
return [];
} else {
return vm_create_base_args.concat(next_host(n), [
coreos_image_ids[coreos_update_channel], 'core',
]);
}
}));
};
exports.create_config = function (name, nodes) {
conf = {
name: name,
nodes: nodes,
weave_salt: util.rand_string(),
resources: {
vnet: [name, 'internal-vnet', util.rand_suffix].join('-'),
service: [name, util.rand_suffix].join('-'),
ssh_key: create_ssh_key(name),
}
};
};
exports.destroy_cluster = function (state_file) {
load_state(state_file);
if (conf.hosts === undefined) {
console.log(clr.red('azure_wrapper/fail: Nothing to delete.'));
process.abort();
}
conf.destroying = true;
task_queue = _.map(conf.hosts, function (host) {
return ['vm', 'delete', '--quiet', '--blob-delete', host.name];
});
task_queue.push(['network', 'vnet', 'delete', '--quiet', conf.resources['vnet']]);
task_queue.push(['storage', 'account', 'delete', '--quiet', conf.resources['storage_account']]);
exports.run_task_queue();
};
exports.load_state_for_resizing = function (state_file, node_type, new_nodes) {
load_state(state_file);
if (conf.hosts === undefined) {
console.log(clr.red('azure_wrapper/fail: Nothing to look at.'));
process.abort();
}
conf.resizing = true;
conf.old_size = conf.nodes[node_type];
conf.old_state_file = state_file;
conf.nodes[node_type] += new_nodes;
hosts.collection = conf.hosts;
hosts.ssh_port_counter += conf.hosts.length;
process.env['AZURE_STORAGE_ACCOUNT'] = conf.resources['storage_account'];
}

Some files were not shown because too many files have changed in this diff Show More