devel/ tree 80col updates; and other minor edits

Signed-off-by: Mike Brown <brownwm@us.ibm.com>
This commit is contained in:
Mike Brown 2016-05-04 14:52:32 -05:00
parent ff339c77cf
commit 0054ddcad1
6 changed files with 542 additions and 223 deletions

View File

@ -31,34 +31,62 @@ Documentation for other releases can be found at
<!-- END STRIP_FOR_RELEASE --> <!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
GitHub Issues for the Kubernetes Project
========================================
A quick overview of how we will review and prioritize incoming issues at https://github.com/kubernetes/kubernetes/issues ## GitHub Issues for the Kubernetes Project
Priorities A quick overview of how we will review and prioritize incoming issues at
---------- https://github.com/kubernetes/kubernetes/issues
We use GitHub issue labels for prioritization. The absence of a ### Priorities
priority label means the bug has not been reviewed and prioritized
yet.
We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. We use GitHub issue labels for prioritization. The absence of a priority label
means the bug has not been reviewed and prioritized yet.
- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. Examples include user-visible bugs in core features, broken builds or tests and critical security issues. We try to apply these priority labels consistently across the entire project,
- **priority/P1**: Must be staffed and worked on either currently, or very soon, ideally in time for the next release. but if you notice an issue that you believe to be incorrectly prioritized,
- **priority/P2**: There appears to be general agreement that this would be good to have, but we don't have anyone available to work on it right now or in the immediate future. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). please do let us know and we will evaluate your counter-proposal.
- **priority/P3**: Possibly useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up.
Milestones - **priority/P0**: Must be actively worked on as someone's top priority right
---------- now. Stuff is burning. If it's not being actively worked on, someone is expected
to drop what they're doing immediately to work on it. Team leaders are
responsible for making sure that all P0's in their area are being actively
worked on. Examples include user-visible bugs in core features, broken builds or
tests and critical security issues.
We additionally use milestones, based on minor version, for determining if a bug should be fixed for the next release. These milestones will be especially scrutinized as we get to the weeks just before a release. We can release a new version of Kubernetes once they are empty. We will have two milestones per minor release. - **priority/P1**: Must be staffed and worked on either currently, or very soon,
ideally in time for the next release.
- **priority/P2**: There appears to be general agreement that this would be good
to have, but we may not have anyone available to work on it right now or in the
immediate future. Community contributions would be most welcome in the mean time
(although it might take a while to get them reviewed if reviewers are fully
occupied with higher priority issues, for example immediately before a release).
- **priority/P3**: Possibly useful, but not yet enough support to actually get
it done. These are mostly place-holders for potentially good ideas, so that they
don't get completely forgotten, and can be referenced/deduped every time they
come up.
### Milestones
We additionally use milestones, based on minor version, for determining if a bug
should be fixed for the next release. These milestones will be especially
scrutinized as we get to the weeks just before a release. We can release a new
version of Kubernetes once they are empty. We will have two milestones per minor
release.
- **vX.Y**: The list of bugs that will be merged for that milestone once ready. - **vX.Y**: The list of bugs that will be merged for that milestone once ready.
- **vX.Y-candidate**: The list of bug that we might merge for that milestone. A bug shouldn't be in this milestone for moe than a day or two towards the end of a milestone. It should be triaged either into vX.Y, or moved out of the release milestones.
The above priority scheme still applies, so P0 and P1 bugs are work we feel must get done before release, while P2 and P3 represent work we would merge into the release if it gets done, but we wouldn't block the release on it. A few days before release, we will probably move all P2 and P3 bugs out of that milestone tag in bulk. - **vX.Y-candidate**: The list of bug that we might merge for that milestone. A
bug shouldn't be in this milestone for more than a day or two towards the end of
a milestone. It should be triaged either into vX.Y, or moved out of the release
milestones.
The above priority scheme still applies. P0 and P1 issues are work we feel must
get done before release. P2 and P3 issues are work we would merge into the
release if it gets done, but we wouldn't block the release on it. A few days
before release, we will probably move all P2 and P3 bugs out of that milestone
in bulk.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]() [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]()

View File

@ -32,14 +32,14 @@ Documentation for other releases can be found at
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
Kubectl Conventions # Kubectl Conventions
===================
Updated: 8/27/2015 Updated: 8/27/2015
**Table of Contents** **Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC --> <!-- BEGIN MUNGE: GENERATED_TOC -->
- [Kubectl Conventions](#kubectl-conventions)
- [Principles](#principles) - [Principles](#principles)
- [Command conventions](#command-conventions) - [Command conventions](#command-conventions)
- [Create commands](#create-commands) - [Create commands](#create-commands)
@ -54,45 +54,89 @@ Updated: 8/27/2015
## Principles ## Principles
* Strive for consistency across commands * Strive for consistency across commands
* Explicit should always override implicit * Explicit should always override implicit
* Environment variables should override default values * Environment variables should override default values
* Command-line flags should override default values and environment variables * Command-line flags should override default values and environment variables
* `--namespace` should also override the value specified in a specified resource
* `--namespace` should also override the value specified in a specified
resource
## Command conventions ## Command conventions
* Command names are all lowercase, and hyphenated if multiple words. * Command names are all lowercase, and hyphenated if multiple words.
* kubectl VERB NOUNs for commands that apply to multiple resource types. * kubectl VERB NOUNs for commands that apply to multiple resource types.
* Command itself should not have built-in aliases. * Command itself should not have built-in aliases.
* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or `TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected.
* Resource types are all lowercase, with no hyphens; both singular and plural forms are accepted. * NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or
* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2 ...` `TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected.
* Resource types are all lowercase, with no hyphens; both singular and plural
forms are accepted.
* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2
...`
* Resource types may have 2- or 3-letter aliases. * Resource types may have 2- or 3-letter aliases.
* Business logic should be decoupled from the command framework, so that it can be reused independently of kubectl, cobra, etc.
* Ideally, commonly needed functionality would be implemented server-side in order to avoid problems typical of "fat" clients and to make it readily available to non-Go clients. * Business logic should be decoupled from the command framework, so that it can
* Commands that generate resources, such as `run` or `expose`, should obey specific conventions, see [generators](#generators). be reused independently of kubectl, cobra, etc.
* A command group (e.g., `kubectl config`) may be used to group related non-standard commands, such as custom generators, mutations, and computations. * Ideally, commonly needed functionality would be implemented server-side in
order to avoid problems typical of "fat" clients and to make it readily
available to non-Go clients.
* Commands that generate resources, such as `run` or `expose`, should obey
specific conventions, see [generators](#generators).
* A command group (e.g., `kubectl config`) may be used to group related
non-standard commands, such as custom generators, mutations, and computations.
### Create commands ### Create commands
`kubectl create <resource>` commands fill the gap between "I want to try Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I want to create exactly this" (author yaml and run `kubectl create -f`). `kubectl create <resource>` commands fill the gap between "I want to try
They provide an easy way to create a valid object without having to know the vagaries of particular kinds, nested fields, and object key typos that are ignored by the yaml/json parser. Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I
Because editing an already created object is easier than authoring one from scratch, these commands only need to have enough parameters to create a valid object and set common immutable fields. It should default as much as is reasonably possible. want to create exactly this" (author yaml and run `kubectl create -f`). They
Once that valid object is created, it can be further manipulated using `kubectl edit` or the eventual `kubectl set` commands. provide an easy way to create a valid object without having to know the vagaries
of particular kinds, nested fields, and object key typos that are ignored by the
yaml/json parser. Because editing an already created object is easier than
authoring one from scratch, these commands only need to have enough parameters
to create a valid object and set common immutable fields. It should default as
much as is reasonably possible. Once that valid object is created, it can be
further manipulated using `kubectl edit` or the eventual `kubectl set` commands.
`kubectl create <resource> <special-case>` commands help in cases where you need to perform non-trivial configuration generation/transformation tailored for a common use case. `kubectl create <resource> <special-case>` commands help in cases where you need
`kubectl create secret` is a good example, there's a `generic` flavor with keys mapping to files, then there's a `docker-registry` flavor that is tailored for creating an image pull secret, to perform non-trivial configuration generation/transformation tailored for a
and there's a `tls` flavor for creating tls secrets. You create these as separate commands to get distinct flags and separate help that is tailored for the particular usage. common use case. `kubectl create secret` is a good example, there's a `generic`
flavor with keys mapping to files, then there's a `docker-registry` flavor that
is tailored for creating an image pull secret, and there's a `tls` flavor for
creating tls secrets. You create these as separate commands to get distinct
flags and separate help that is tailored for the particular usage.
## Flag conventions ## Flag conventions
* Flags are all lowercase, with words separated by hyphens * Flags are all lowercase, with words separated by hyphens
* Flag names and single-character aliases should have the same meaning across all commands
* Command-line flags corresponding to API fields should accept API enums exactly (e.g., `--restart=Always`) * Flag names and single-character aliases should have the same meaning across
* Do not reuse flags for different semantic purposes, and do not use different flag names for the same semantic purpose -- grep for `"Flags()"` before adding a new flag all commands
* Use short flags sparingly, only for the most frequently used options, prefer lowercase over uppercase for the most common cases, try to stick to well known conventions for UNIX commands and/or Docker, where they exist, and update this list when adding new short flags
* Command-line flags corresponding to API fields should accept API enums
exactly (e.g., `--restart=Always`)
* Do not reuse flags for different semantic purposes, and do not use different
flag names for the same semantic purpose -- grep for `"Flags()"` before adding a
new flag
* Use short flags sparingly, only for the most frequently used options, prefer
lowercase over uppercase for the most common cases, try to stick to well known
conventions for UNIX commands and/or Docker, where they exist, and update this
list when adding new short flags
* `-f`: Resource file * `-f`: Resource file
* also used for `--follow` in `logs`, but should be deprecated in favor of `-F` * also used for `--follow` in `logs`, but should be deprecated in favor of `-F`
* `-l`: Label selector * `-l`: Label selector
@ -111,51 +155,116 @@ and there's a `tls` flavor for creating tls secrets. You create these as separa
* `-r`: Replicas * `-r`: Replicas
* `-u`: Unix socket * `-u`: Unix socket
* `-v`: Verbose logging level * `-v`: Verbose logging level
* `--dry-run`: Don't modify the live state; simulate the mutation and display the output. All mutations should support it.
* `--local`: Don't contact the server; just do local read, transformation, generation, etc., and display the output
* `--dry-run`: Don't modify the live state; simulate the mutation and display
the output. All mutations should support it.
* `--local`: Don't contact the server; just do local read, transformation,
generation, etc., and display the output
* `--output-version=...`: Convert the output to a different API group/version * `--output-version=...`: Convert the output to a different API group/version
* `--validate`: Validate the resource schema * `--validate`: Validate the resource schema
## Output conventions ## Output conventions
* By default, output is intended for humans rather than programs * By default, output is intended for humans rather than programs
* However, affordances are made for simple parsing of `get` output * However, affordances are made for simple parsing of `get` output
* Only errors should be directed to stderr * Only errors should be directed to stderr
* `get` commands should output one row per resource, and one resource per row * `get` commands should output one row per resource, and one resource per row
* Column titles and values should not contain spaces in order to facilitate commands that break lines into fields: cut, awk, etc. Instead, use `-` as the word separator.
* Column titles and values should not contain spaces in order to facilitate
commands that break lines into fields: cut, awk, etc. Instead, use `-` as the
word separator.
* By default, `get` output should fit within about 80 columns * By default, `get` output should fit within about 80 columns
* Eventually we could perhaps auto-detect width * Eventually we could perhaps auto-detect width
* `-o wide` may be used to display additional columns * `-o wide` may be used to display additional columns
* The first column should be the resource name, titled `NAME` (may change this to an abbreviation of resource type)
* NAMESPACE should be displayed as the first column when --all-namespaces is specified
* The first column should be the resource name, titled `NAME` (may change this
to an abbreviation of resource type)
* NAMESPACE should be displayed as the first column when --all-namespaces is
specified
* The last default column should be time since creation, titled `AGE` * The last default column should be time since creation, titled `AGE`
* `-Lkey` should append a column containing the value of label with key `key`, with `<none>` if not present
* json, yaml, Go template, and jsonpath template formats should be supported and encouraged for subsequent processing * `-Lkey` should append a column containing the value of label with key `key`,
* Users should use --api-version or --output-version to ensure the output uses the version they expect with `<none>` if not present
* `describe` commands may output on multiple lines and may include information from related resources, such as events. Describe should add additional information from related resources that a normal user may need to know - if a user would always run "describe resource1" and the immediately want to run a "get type2" or "describe resource2", consider including that info. Examples, persistent volume claims for pods that reference claims, events for most resources, nodes and the pods scheduled on them. When fetching related resources, a targeted field selector should be used in favor of client side filtering of related resources.
* For fields that can be explicitly unset (booleans, integers, structs), the output should say `<unset>`. Likewise, for arrays `<none>` should be used. Lastly `<unknown>` should be used where unrecognized field type was specified. * json, yaml, Go template, and jsonpath template formats should be supported
* Mutations should output TYPE/name verbed by default, where TYPE is singular; `-o name` may be used to just display TYPE/name, which may be used to specify resources in other commands and encouraged for subsequent processing
* Users should use --api-version or --output-version to ensure the output
uses the version they expect
* `describe` commands may output on multiple lines and may include information
from related resources, such as events. Describe should add additional
information from related resources that a normal user may need to know - if a
user would always run "describe resource1" and the immediately want to run a
"get type2" or "describe resource2", consider including that info. Examples,
persistent volume claims for pods that reference claims, events for most
resources, nodes and the pods scheduled on them. When fetching related
resources, a targeted field selector should be used in favor of client side
filtering of related resources.
* For fields that can be explicitly unset (booleans, integers, structs), the
output should say `<unset>`. Likewise, for arrays `<none>` should be used.
Lastly `<unknown>` should be used where unrecognized field type was specified.
* Mutations should output TYPE/name verbed by default, where TYPE is singular;
`-o name` may be used to just display TYPE/name, which may be used to specify
resources in other commands
## Documentation conventions ## Documentation conventions
* Commands are documented using Cobra; docs are then auto-generated by `hack/update-generated-docs.sh`. * Commands are documented using Cobra; docs are then auto-generated by
* Use should contain a short usage string for the most common use case(s), not an exhaustive specification `hack/update-generated-docs.sh`.
* Use should contain a short usage string for the most common use case(s), not
an exhaustive specification
* Short should contain a one-line explanation of what the command does * Short should contain a one-line explanation of what the command does
* Long may contain multiple lines, including additional information about input, output, commonly used flags, etc.
* Long may contain multiple lines, including additional information about
input, output, commonly used flags, etc.
* Example should contain examples * Example should contain examples
* Start commands with `$` * Start commands with `$`
* A comment should precede each example command, and should begin with `#` * A comment should precede each example command, and should begin with `#`
* Use "FILENAME" for filenames * Use "FILENAME" for filenames
* Use "TYPE" for the particular flavor of resource type accepted by kubectl, rather than "RESOURCE" or "KIND"
* Use "TYPE" for the particular flavor of resource type accepted by kubectl,
rather than "RESOURCE" or "KIND"
* Use "NAME" for resource names * Use "NAME" for resource names
## Command implementation conventions ## Command implementation conventions
For every command there should be a `NewCmd<CommandName>` function that creates the command and returns a pointer to a `cobra.Command`, which can later be added to other parent commands to compose the structure tree. There should also be a `<CommandName>Config` struct with a variable to every flag and argument declared by the command (and any other variable required for the command to run). This makes tests and mocking easier. The struct ideally exposes three methods: For every command there should be a `NewCmd<CommandName>` function that creates
the command and returns a pointer to a `cobra.Command`, which can later be added
to other parent commands to compose the structure tree. There should also be a
`<CommandName>Config` struct with a variable to every flag and argument declared
by the command (and any other variable required for the command to run). This
makes tests and mocking easier. The struct ideally exposes three methods:
* `Complete`: Completes the struct fields with values that may or may not be directly provided by the user, for example, by flags pointers, by the `args` slice, by using the Factory, etc. * `Complete`: Completes the struct fields with values that may or may not be
* `Validate`: performs validation on the struct fields and returns appropriate errors. directly provided by the user, for example, by flags pointers, by the `args`
* `Run<CommandName>`: runs the actual logic of the command, taking as assumption that the struct is complete with all required values to run, and they are valid. slice, by using the Factory, etc.
* `Validate`: performs validation on the struct fields and returns appropriate
errors.
* `Run<CommandName>`: runs the actual logic of the command, taking as assumption
that the struct is complete with all required values to run, and they are valid.
Sample command skeleton: Sample command skeleton:
@ -221,19 +330,41 @@ func (o MineConfig) RunMine() error {
} }
``` ```
The `Run<CommandName>` method should contain the business logic of the command and as noted in [command conventions](#command-conventions), ideally that logic should exist server-side so any client could take advantage of it. Notice that this is not a mandatory structure and not every command is implemented this way, but this is a nice convention so try to be compliant with it. As an example, have a look at how [kubectl logs](../../pkg/kubectl/cmd/logs.go) is implemented. The `Run<CommandName>` method should contain the business logic of the command
and as noted in [command conventions](#command-conventions), ideally that logic
should exist server-side so any client could take advantage of it. Notice that
this is not a mandatory structure and not every command is implemented this way,
but this is a nice convention so try to be compliant with it. As an example,
have a look at how [kubectl logs](../../pkg/kubectl/cmd/logs.go) is implemented.
## Generators ## Generators
Generators are kubectl commands that generate resources based on a set of inputs (other resources, flags, or a combination of both). Generators are kubectl commands that generate resources based on a set of inputs
(other resources, flags, or a combination of both).
The point of generators is: The point of generators is:
* to enable users using kubectl in a scripted fashion to pin to a particular behavior which may change in the future. Explicit use of a generator will always guarantee that the expected behavior stays the same.
* to enable potential expansion of the generated resources for scenarios other than just creation, similar to how -f is supported for most general-purpose commands. * to enable users using kubectl in a scripted fashion to pin to a particular
behavior which may change in the future. Explicit use of a generator will always
guarantee that the expected behavior stays the same.
* to enable potential expansion of the generated resources for scenarios other
than just creation, similar to how -f is supported for most general-purpose
commands.
Generator commands shoud obey to the following conventions: Generator commands shoud obey to the following conventions:
* A `--generator` flag should be defined. Users then can choose between different generators, if the command supports them (for example, `kubectl run` currently supports generators for pods, jobs, replication controllers, and deployments), or between different versions of a generator so that users depending on a specific behavior may pin to that version (for example, `kubectl expose` currently supports two different versions of a service generator).
* Generation should be decoupled from creation. A generator should implement the `kubectl.StructuredGenerator` interface and have no dependencies on cobra or the Factory. See, for example, how the first version of the namespace generator is defined: * A `--generator` flag should be defined. Users then can choose between
different generators, if the command supports them (for example, `kubectl run`
currently supports generators for pods, jobs, replication controllers, and
deployments), or between different versions of a generator so that users
depending on a specific behavior may pin to that version (for example, `kubectl
expose` currently supports two different versions of a service generator).
* Generation should be decoupled from creation. A generator should implement the
`kubectl.StructuredGenerator` interface and have no dependencies on cobra or the
Factory. See, for example, how the first version of the namespace generator is
defined:
```go ```go
// NamespaceGeneratorV1 supports stable generation of a namespace // NamespaceGeneratorV1 supports stable generation of a namespace
@ -264,8 +395,14 @@ func (g *NamespaceGeneratorV1) validate() error {
} }
``` ```
The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for namespace generation. It also satisfies the `kubectl.StructuredGenerator` interface by implementing the `StructuredGenerate() (runtime.Object, error)` method which configures the generated namespace that callers of the generator (`kubectl create namespace` in our case) need to create. The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for
* `--dry-run` should output the resource that would be created, without creating it. namespace generation. It also satisfies the `kubectl.StructuredGenerator`
interface by implementing the `StructuredGenerate() (runtime.Object, error)`
method which configures the generated namespace that callers of the generator
(`kubectl create namespace` in our case) need to create.
* `--dry-run` should output the resource that would be created, without
creating it.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

View File

@ -36,27 +36,37 @@ Documentation for other releases can be found at
## Introduction ## Introduction
Kubemark is a performance testing tool which allows users to run experiments on simulated clusters. The primary use case is scalability testing, as simulated clusters can be Kubemark is a performance testing tool which allows users to run experiments on
much bigger than the real ones. The objective is to expose problems with the master components (API server, controller manager or scheduler) that appear only on bigger simulated clusters. The primary use case is scalability testing, as simulated
clusters (e.g. small memory leaks). clusters can be much bigger than the real ones. The objective is to expose
problems with the master components (API server, controller manager or
scheduler) that appear only on bigger clusters (e.g. small memory leaks).
This document serves as a primer to understand what Kubemark is, what it is not, and how to use it. This document serves as a primer to understand what Kubemark is, what it is not,
and how to use it.
## Architecture ## Architecture
On a very high level Kubemark cluster consists of two parts: real master components and a set of “Hollow” Nodes. The prefix “Hollow” means an implementation/instantiation of a On a very high level Kubemark cluster consists of two parts: real master
component with all “moving” parts mocked out. The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but does not start anything, nor mount any volumes - components and a set of “Hollow” Nodes. The prefix “Hollow” means an
it just lies it does. More detailed design and implementation details are at the end of this document. implementation/instantiation of a component with all “moving” parts mocked out.
The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but
does not start anything, nor mount any volumes - it just lies it does. More
detailed design and implementation details are at the end of this document.
Currently master components run on a dedicated machine(s), and HollowNodes run on an external Kubernetes cluster. This design has a slight advantage, over running master Currently master components run on a dedicated machine(s), and HollowNodes run
components on external cluster, of completely isolating master resources from everything else. on an external Kubernetes cluster. This design has a slight advantage, over
running master components on external cluster, of completely isolating master
resources from everything else.
## Requirements ## Requirements
To run Kubemark you need a Kubernetes cluster for running all your HollowNodes and a dedicated machine for a master. Master machine has to be directly routable from To run Kubemark you need a Kubernetes cluster for running all your HollowNodes
HollowNodes. You also need an access to some Docker repository. and a dedicated machine for a master. Master machine has to be directly routable
from HollowNodes. You also need an access to some Docker repository.
Currently scripts are written to be easily usable by GCE, but it should be relatively straightforward to port them to different providers or bare metal. Currently scripts are written to be easily usable by GCE, but it should be
relatively straightforward to port them to different providers or bare metal.
## Common use cases and helper scripts ## Common use cases and helper scripts
@ -66,71 +76,116 @@ Common workflow for Kubemark is:
- monitoring test execution and debugging problems - monitoring test execution and debugging problems
- turning down Kubemark cluster - turning down Kubemark cluster
Included in descrptions there will be comments helpful for anyone wholl want to port Kubemark to different providers. Included in descrptions there will be comments helpful for anyone wholl want to
port Kubemark to different providers.
### Starting a Kubemark cluster ### Starting a Kubemark cluster
To start a Kubemark cluster on GCE you need to create an external cluster (it can be GCE, GKE or any other cluster) by yourself, build a kubernetes release (e.g. by running To start a Kubemark cluster on GCE you need to create an external cluster (it
`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. This script will create a VM for master components, Pods for HollowNodes and do all the setup necessary can be GCE, GKE or any other cluster) by yourself, build a kubernetes release
to let them talk to each other. It will use the configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that some features (e.g. by running `make quick-release`) and run `test/kubemark/start-kubemark.sh`
may not be implemented yet, as implementation of Hollow components/mocks will probably be lagging behind real one. For performance tests interesting variables are script. This script will create a VM for master components, Pods for HollowNodes
`NUM_NODES` and `MASTER_SIZE`. After start-kubemark script is finished youll have a ready Kubemark cluster, a kubeconfig file for talking to the Kubemark and do all the setup necessary to let them talk to each other. It will use the
cluster is stored in `test/kubemark/kubeconfig.loc`. configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it
however you want, but note that some features may not be implemented yet, as
implementation of Hollow components/mocks will probably be lagging behind real
one. For performance tests interesting variables are `NUM_NODES` and
`MASTER_SIZE`. After start-kubemark script is finished youll have a ready
Kubemark cluster, a kubeconfig file for talking to the Kubemark cluster is
stored in `test/kubemark/kubeconfig.loc`.
Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or memory, which taking into account default cluster addons and fluentD running on an 'external' Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or
cluster, allows running ~17.5 HollowNodes per core. memory, which taking into account default cluster addons and fluentD running on
an 'external' cluster, allows running ~17.5 HollowNodes per core.
#### Behind the scene details: #### Behind the scene details:
Start-kubemark script does quite a lot of things: Start-kubemark script does quite a lot of things:
- Creates a master machine called hollow-cluster-master and PD for it (*uses gcloud, should be easy to do outside of GCE*)
- Creates a firewall rule which opens port 443\* on the master machine (*uses gcloud, should be easy to do outside of GCE*)
- Builds a Docker image for HollowNode from the current repository and pushes it to the Docker repository (*GCR for us, using scripts from `cluster/gce/util.sh` - it may get
tricky outside of GCE*)
- Generates certificates and kubeconfig files, writes a kubeconfig locally to `test/kubemark/kubeconfig.loc` and creates a Secret which stores kubeconfig for HollowKubelet/
HollowProxy use (*used gcloud to transfer files to Master, should be easy to do outside of GCE*).
- Creates a ReplicationController for HollowNodes and starts them up. (*will work exactly the same everywhere as long as MASTER_IP will be populated correctly, but youll need
to update docker image address if youre not using GCR and default image name*)
- Waits until all HollowNodes are in the Running phase (*will work exactly the same everywhere*)
<sub>\* Port 443 is a secured port on the master machine which is used for all external communication with the API server. In the last sentence *external* means all traffic - Creates a master machine called hollow-cluster-master and PD for it (*uses
coming from other machines, including all the Nodes, not only from outside of the cluster. Currently local components, i.e. ControllerManager and Scheduler talk with API server using insecure port 8080.</sub> gcloud, should be easy to do outside of GCE*)
- Creates a firewall rule which opens port 443\* on the master machine (*uses
gcloud, should be easy to do outside of GCE*)
- Builds a Docker image for HollowNode from the current repository and pushes it
to the Docker repository (*GCR for us, using scripts from
`cluster/gce/util.sh` - it may get tricky outside of GCE*)
- Generates certificates and kubeconfig files, writes a kubeconfig locally to
`test/kubemark/kubeconfig.loc` and creates a Secret which stores kubeconfig for
HollowKubelet/HollowProxy use (*used gcloud to transfer files to Master, should
be easy to do outside of GCE*).
- Creates a ReplicationController for HollowNodes and starts them up. (*will
work exactly the same everywhere as long as MASTER_IP will be populated
correctly, but youll need to update docker image address if youre not using
GCR and default image name*)
- Waits until all HollowNodes are in the Running phase (*will work exactly the
same everywhere*)
<sub>\* Port 443 is a secured port on the master machine which is used for all
external communication with the API server. In the last sentence *external*
means all traffic coming from other machines, including all the Nodes, not only
from outside of the cluster. Currently local components, i.e. ControllerManager
and Scheduler talk with API server using insecure port 8080.</sub>
### Running e2e tests on Kubemark cluster ### Running e2e tests on Kubemark cluster
To run standard e2e test on your Kubemark cluster created in the previous step you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to To run standard e2e test on your Kubemark cluster created in the previous step
use Kubemark cluster instead of something else and start an e2e test. This script should not need any changes to work on other cloud providers. you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to
use Kubemark cluster instead of something else and start an e2e test. This
script should not need any changes to work on other cloud providers.
By default (if nothing will be passed to it) the script will run a Density '30 test. If you want to run a different e2e test you just need to provide flags you want to be By default (if nothing will be passed to it) the script will run a Density '30
passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the Load test. test. If you want to run a different e2e test you just need to provide flags you want to be
passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the
Load test.
By default, at the end of each test, it will delete namespaces and everything under it (e.g. events, replication controllers) on Kubemark master, which takes a lot of time. By default, at the end of each test, it will delete namespaces and everything
Such work aren't needed in most cases: if you delete your Kubemark cluster after running `run-e2e-tests.sh`; under it (e.g. events, replication controllers) on Kubemark master, which takes
you don't care about namespace deletion performance, specifically related to etcd; etc. a lot of time. Such work aren't needed in most cases: if you delete your
There is a flag that enables you to avoid namespace deletion: `--delete-namespace=false`. Kubemark cluster after running `run-e2e-tests.sh`; you don't care about
Adding the flag should let you see in logs: `Found DeleteNamespace=false, skipping namespace deletion!` namespace deletion performance, specifically related to etcd; etc. There is a
flag that enables you to avoid namespace deletion: `--delete-namespace=false`.
Adding the flag should let you see in logs: `Found DeleteNamespace=false,
skipping namespace deletion!`
### Monitoring test execution and debugging problems ### Monitoring test execution and debugging problems
Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but if you need to dig deeper you need to learn how to debug HollowNodes and how Master Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but
machine (currently) differs from the ordinary one. if you need to dig deeper you need to learn how to debug HollowNodes and how
Master machine (currently) differs from the ordinary one.
If you need to debug master machine you can do similar things as you do on your ordinary master. The difference between Kubemark setup and ordinary setup is that in Kubemark If you need to debug master machine you can do similar things as you do on your
etcd is run as a plain docker container, and all master components are run as normal processes. Theres no Kubelet overseeing them. Logs are stored in exactly the same place, ordinary master. The difference between Kubemark setup and ordinary setup is
i.e. `/var/logs/` directory. Because binaries are not supervised by anything they won't be restarted in the case of a crash. that in Kubemark etcd is run as a plain docker container, and all master
components are run as normal processes. Theres no Kubelet overseeing them. Logs
are stored in exactly the same place, i.e. `/var/logs/` directory. Because
binaries are not supervised by anything they won't be restarted in the case of a
crash.
To help you with debugging from inside the cluster startup script puts a `~/configure-kubectl.sh` script on the master. It downloads `gcloud` and `kubectl` tool and configures To help you with debugging from inside the cluster startup script puts a
kubectl to work on unsecured master port (useful if there are problems with security). After the script is run you can use kubectl command from the master machine to play with `~/configure-kubectl.sh` script on the master. It downloads `gcloud` and
the cluster. `kubectl` tool and configures kubectl to work on unsecured master port (useful
if there are problems with security). After the script is run you can use
kubectl command from the master machine to play with the cluster.
Debugging HollowNodes is a bit more tricky, as if you experience a problem on one of them you need to learn which hollow-node pod corresponds to a given HollowNode known by Debugging HollowNodes is a bit more tricky, as if you experience a problem on
the Master. During self-registeration HollowNodes provide their cluster IPs as Names, which means that if you need to find a HollowNode named `10.2.4.5` you just need to find a one of them you need to learn which hollow-node pod corresponds to a given
Pod in external cluster with this cluster IP. Theres a helper script `test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you. HollowNode known by the Master. During self-registeration HollowNodes provide
their cluster IPs as Names, which means that if you need to find a HollowNode
named `10.2.4.5` you just need to find a Pod in external cluster with this
cluster IP. Theres a helper script
`test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you.
When you have a Pod name you can use `kubectl logs` on external cluster to get logs, or use a `kubectl describe pod` call to find an external Node on which this particular When you have a Pod name you can use `kubectl logs` on external cluster to get
HollowNode is running so you can ssh to it. logs, or use a `kubectl describe pod` call to find an external Node on which
this particular HollowNode is running so you can ssh to it.
E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. To do so you can execute: E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running.
To do so you can execute:
``` ```
$ kubectl kubernetes/test/kubemark/kubeconfig.loc describe pod my-pod $ kubectl kubernetes/test/kubemark/kubeconfig.loc describe pod my-pod
@ -142,7 +197,8 @@ Which outputs pod description and among it a line:
Node: 1.2.3.4/1.2.3.4 Node: 1.2.3.4/1.2.3.4
``` ```
To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use aforementioned script: To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use
aforementioned script:
``` ```
$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4 $ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4
@ -164,17 +220,23 @@ All those things should work exactly the same on all cloud providers.
### Turning down Kubemark cluster ### Turning down Kubemark cluster
On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which will delete HollowNode ReplicationController and all the resources for you. On other providers On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which
youll need to delete all this stuff by yourself. will delete HollowNode ReplicationController and all the resources for you. On
other providers youll need to delete all this stuff by yourself.
## Some current implementation details ## Some current implementation details
Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This means that it will never be out of date. On the other hand HollowNodes use existing fake for Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This
Kubelet (called SimpleKubelet), which mocks its runtime manager with `pkg/kubelet/fake-docker-manager.go`, where most logic sits. Because theres no easy way of mocking other means that it will never be out of date. On the other hand HollowNodes use
managers (e.g. VolumeManager), they are not supported in Kubemark (e.g. we cant schedule Pods with volumes in them yet). existing fake for Kubelet (called SimpleKubelet), which mocks its runtime
manager with `pkg/kubelet/fake-docker-manager.go`, where most logic sits.
Because theres no easy way of mocking other managers (e.g. VolumeManager), they
are not supported in Kubemark (e.g. we cant schedule Pods with volumes in them
yet).
As the time passes more fakes will probably be plugged into HollowNodes, but its crucial to make it as simple as possible to allow running a big number of Hollows on a single As the time passes more fakes will probably be plugged into HollowNodes, but
core. its crucial to make it as simple as possible to allow running a big number of
Hollows on a single core.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

View File

@ -31,13 +31,17 @@ Documentation for other releases can be found at
<!-- END STRIP_FOR_RELEASE --> <!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
Logging Conventions
===================
The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally preferred to [log](http://golang.org/pkg/log/) for better runtime control. ## Logging Conventions
The following conventions for the glog levels to use.
[glog](http://godoc.org/github.com/golang/glog) is globally preferred to
[log](http://golang.org/pkg/log/) for better runtime control.
* glog.Errorf() - Always an error * glog.Errorf() - Always an error
* glog.Warningf() - Something unexpected, but probably not an error * glog.Warningf() - Something unexpected, but probably not an error
* glog.Infof() has multiple levels: * glog.Infof() has multiple levels:
* glog.V(0) - Generally useful for this to ALWAYS be visible to an operator * glog.V(0) - Generally useful for this to ALWAYS be visible to an operator
* Programmer errors * Programmer errors
@ -56,7 +60,9 @@ The following conventions for the glog levels to use. [glog](http://godoc.org/g
* glog.V(4) - Debug level verbosity (for now) * glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where you may want to come back later and check it * Logging in particularly thorny parts of code where you may want to come back later and check it
As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log. As per the comments, the practical default level is V(2). Developers and QE
environments may wish to run at V(3) or V(4). If you wish to change the log
level, you can pass in `-v=X` where X is the desired maximum level to log.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->

View File

@ -38,10 +38,14 @@ This documents the process for making release notes for a release.
### 1) Note the PR number of the previous release ### 1) Note the PR number of the previous release
Find the most-recent PR that was merged with the previous .0 release. Remember this as $LASTPR. Find the most-recent PR that was merged with the previous .0 release. Remember
_TODO_: Figure out a way to record this somewhere to save the next release engineer time. this as $LASTPR.
Find the most-recent PR that was merged with the current .0 release. Remember this as $CURRENTPR. - _TODO_: Figure out a way to record this somewhere to save the next
release engineer time.
Find the most-recent PR that was merged with the current .0 release. Remember
this as $CURRENTPR.
### 2) Run the release-notes tool ### 2) Run the release-notes tool
@ -52,7 +56,7 @@ ${KUBERNETES_ROOT}/build/make-release-notes.sh $LASTPR $CURRENTPR
### 3) Trim the release notes ### 3) Trim the release notes
This generates a list of the entire set of PRs merged since the last minor This generates a list of the entire set of PRs merged since the last minor
release. It is likely long and many PRs aren't worth mentioning. If any of the release. It is likely long and many PRs aren't worth mentioning. If any of the
PRs were cherrypicked into patches on the last minor release, you should exclude PRs were cherrypicked into patches on the last minor release, you should exclude
them from the current release's notes. them from the current release's notes.
@ -67,9 +71,13 @@ With the final markdown all set, cut and paste it to the top of `CHANGELOG.md`
### 5) Update the Release page ### 5) Update the Release page
* Switch to the [releases](https://github.com/kubernetes/kubernetes/releases) page. * Switch to the [releases](https://github.com/kubernetes/kubernetes/releases)
page.
* Open up the release you are working on. * Open up the release you are working on.
* Cut and paste the final markdown from above into the release notes * Cut and paste the final markdown from above into the release notes
* Press Save. * Press Save.

View File

@ -36,129 +36,207 @@ Documentation for other releases can be found at
## Introduction ## Introduction
We have observed two different cluster management architectures, which can be categorized as "Borg-style" and "Mesos/Omega-style." We have observed two different cluster management architectures, which can be
(In the remainder of this document, we will abbreviate the latter as "Mesos-style.") categorized as "Borg-style" and "Mesos/Omega-style." In the remainder of this
Although out-of-the box Kubernetes uses a Borg-style architecture, it can also be configured in a Mesos-style architecture, document, we will abbreviate the latter as "Mesos-style." Although out-of-the
and in fact can support both styles at the same time. This document describes the two approaches and describes how box Kubernetes uses a Borg-style architecture, it can also be configured in a
to deploy a Mesos-style architecture on Kubernetes. Mesos-style architecture, and in fact can support both styles at the same time.
This document describes the two approaches and describes how to deploy a
Mesos-style architecture on Kubernetes.
(As an aside, the converse is also true: one can deploy a Borg/Kubernetes-style architecture on Mesos.) As an aside, the converse is also true: one can deploy a Borg/Kubernetes-style
architecture on Mesos.
This document is NOT intended to provide a comprehensive comparison of Borg and Mesos. For example, we omit discussion This document is NOT intended to provide a comprehensive comparison of Borg and
of the tradeoffs between scheduling with full knowledge of cluster state vs. scheduling using the "offer" model. Mesos. For example, we omit discussion of the tradeoffs between scheduling with
(That issue is discussed in some detail in the Omega paper (see references section at the end of this doc).) full knowledge of cluster state vs. scheduling using the "offer" model. That
issue is discussed in some detail in the Omega paper.
(See [references](#references) below.)
## What is a Borg-style architecture? ## What is a Borg-style architecture?
A Borg-style architecture is characterized by: A Borg-style architecture is characterized by:
* a single logical API endpoint for clients, where some amount of processing is done on requests, such as admission control and applying defaults
* generic (non-application-specific) collection abstractions described declaratively, * a single logical API endpoint for clients, where some amount of processing is
* generic controllers/state machines that manage the lifecycle of the collection abstractions and the containers spawned from them done on requests, such as admission control and applying defaults
* generic (non-application-specific) collection abstractions described
declaratively,
* generic controllers/state machines that manage the lifecycle of the collection
abstractions and the containers spawned from them
* a generic scheduler * a generic scheduler
For example, Borg's primary collection abstraction is a Job, and every application that runs on Borg--whether it's a user-facing For example, Borg's primary collection abstraction is a Job, and every
service like the GMail front-end, a batch job like a MapReduce, or an infrastructure service like GFS--must represent itself as application that runs on Borg--whether it's a user-facing service like the GMail
a Job. Borg has corresponding state machine logic for managing Jobs and their instances, and a scheduler that's responsible front-end, a batch job like a MapReduce, or an infrastructure service like
for assigning the instances to machines. GFS--must represent itself as a Job. Borg has corresponding state machine logic
for managing Jobs and their instances, and a scheduler that's responsible for
assigning the instances to machines.
The flow of a request in Borg is: The flow of a request in Borg is:
1. Client submits a collection object to the Borgmaster API endpoint 1. Client submits a collection object to the Borgmaster API endpoint
1. Admission control, quota, applying defaults, etc. run on the collection
1. If the collection is admitted, it is persisted, and the collection state machine creates the underlying instances
1. The scheduler assigns a hostname to the instance, and tells the Borglet to start the instance's container(s)
1. Borglet starts the container(s)
1. The instance state machine manages the instances and the collection state machine manages the collection during their lifetimes
Out-of-the-box Kubernetes has *workload-specific* abstractions (ReplicaSet, Job, DaemonSet, etc.) and corresponding controllers, 1. Admission control, quota, applying defaults, etc. run on the collection
and in the future may have [workload-specific schedulers](../../docs/proposals/multiple-schedulers.md),
e.g. different schedulers for long-running services vs. short-running batch. But these abstractions, controllers, and 1. If the collection is admitted, it is persisted, and the collection state
schedulers are not *application-specific*. machine creates the underlying instances
1. The scheduler assigns a hostname to the instance, and tells the Borglet to
start the instance's container(s)
1. Borglet starts the container(s)
1. The instance state machine manages the instances and the collection state
machine manages the collection during their lifetimes
Out-of-the-box Kubernetes has *workload-specific* abstractions (ReplicaSet, Job,
DaemonSet, etc.) and corresponding controllers, and in the future may have
[workload-specific schedulers](../../docs/proposals/multiple-schedulers.md),
e.g. different schedulers for long-running services vs. short-running batch. But
these abstractions, controllers, and schedulers are not *application-specific*.
The usual request flow in Kubernetes is very similar, namely The usual request flow in Kubernetes is very similar, namely
1. Client submits a collection object (e.g. ReplicaSet, Job, ...) to the API server 1. Client submits a collection object (e.g. ReplicaSet, Job, ...) to the API
1. Admission control, quota, applying defaults, etc. run on the collection server
1. If the collection is admitted, it is persisted, and the corresponding collection controller creates the underlying pods
1. Admission control, quota, applying defaults, etc. runs on each pod; if there are multiple schedulers, one of the admission
controllers will write the scheduler name as an annotation based on a policy
1. If a pod is admitted, it is persisted
1. The appropriate scheduler assigns a nodeName to the instance, which triggers the Kubelet to start the pod's container(s)
1. Kubelet starts the container(s)
1. The controller corresponding to the collection manages the pod and the collection during their lifetime
In the Borg model, application-level scheduling and cluster-level scheduling are handled by separate 1. Admission control, quota, applying defaults, etc. run on the collection
components. For example, a MapReduce master might request Borg to create a job with a certain number of instances
with a particular resource shape, where each instance corresponds to a MapReduce worker; the MapReduce master would 1. If the collection is admitted, it is persisted, and the corresponding
then schedule individual units of work onto those workers. collection controller creates the underlying pods
1. Admission control, quota, applying defaults, etc. runs on each pod; if there
are multiple schedulers, one of the admission controllers will write the
scheduler name as an annotation based on a policy
1. If a pod is admitted, it is persisted
1. The appropriate scheduler assigns a nodeName to the instance, which triggers
the Kubelet to start the pod's container(s)
1. Kubelet starts the container(s)
1. The controller corresponding to the collection manages the pod and the
collection during their lifetime
In the Borg model, application-level scheduling and cluster-level scheduling are
handled by separate components. For example, a MapReduce master might request
Borg to create a job with a certain number of instances with a particular
resource shape, where each instance corresponds to a MapReduce worker; the
MapReduce master would then schedule individual units of work onto those
workers.
## What is a Mesos-style architecture? ## What is a Mesos-style architecture?
Mesos is fundamentally designed to support multiple application-specific "frameworks." A framework is Mesos is fundamentally designed to support multiple application-specific
composed of a "framework scheduler" and a "framework executor." We will abbreviate "framework scheduler" "frameworks." A framework is composed of a "framework scheduler" and a
as "framework" since "scheduler" means something very different in Kubernetes (something that just "framework executor." We will abbreviate "framework scheduler" as "framework"
assigns pods to nodes). since "scheduler" means something very different in Kubernetes (something that
just assigns pods to nodes).
Unlike Borg and Kubernetes, where there is a single logical endpoint that receives all API requests (the Borgmaster and API server, Unlike Borg and Kubernetes, where there is a single logical endpoint that
respectively), in Mesos every framework is a separate API endpoint. Mesos does not have any standard set of receives all API requests (the Borgmaster and API server, respectively), in
collection abstractions, controllers/state machines, or schedulers; the logic for all of these things is contained Mesos every framework is a separate API endpoint. Mesos does not have any
in each [application-specific framework](http://mesos.apache.org/documentation/latest/frameworks/) individually. standard set of collection abstractions, controllers/state machines, or
(Note that the notion of application-specific does sometimes blur into the realm of workload-specific, schedulers; the logic for all of these things is contained in each
for example [Chronos](https://github.com/mesos/chronos) is a generic framework for batch jobs. [application-specific framework](http://mesos.apache.org/documentation/latest/frameworks/)
However, regardless of what set of Mesos frameworks you are using, the key properties remain: each individually. (Note that the notion of application-specific does sometimes blur
framework is its own API endpoint with its own client-facing and internal abstractions, state machines, and scheduler). into the realm of workload-specific, for example
[Chronos](https://github.com/mesos/chronos) is a generic framework for batch
jobs. However, regardless of what set of Mesos frameworks you are using, the key
properties remain: each framework is its own API endpoint with its own
client-facing and internal abstractions, state machines, and scheduler).
A Mesos framework can integrate application-level scheduling and cluster-level scheduling into a single component. A Mesos framework can integrate application-level scheduling and cluster-level
scheduling into a single component.
Note: Although Mesos frameworks expose their own API endpoints to clients, they consume a common Note: Although Mesos frameworks expose their own API endpoints to clients, they
infrastructure via a common API endpoint for controlling tasks (launching, detecting failure, etc.) and learning about available consume a common infrastructure via a common API endpoint for controlling tasks
cluster resources. More details [here](http://mesos.apache.org/documentation/latest/scheduler-http-api/). (launching, detecting failure, etc.) and learning about available cluster
resources. More details
[here](http://mesos.apache.org/documentation/latest/scheduler-http-api/).
## Building a Mesos-style framework on Kubernetes ## Building a Mesos-style framework on Kubernetes
Implementing the Mesos model on Kubernetes boils down to enabling application-specific collection abstractions, Implementing the Mesos model on Kubernetes boils down to enabling
controllers/state machines, and scheduling. There are just three steps: application-specific collection abstractions, controllers/state machines, and
* Use API plugins to create API resources for your new application-specific collection abstraction(s) scheduling. There are just three steps:
* Implement controllers for the new abstractions (and for managing the lifecycle of the pods the controllers generate)
* Use API plugins to create API resources for your new application-specific
collection abstraction(s)
* Implement controllers for the new abstractions (and for managing the lifecycle
of the pods the controllers generate)
* Implement a scheduler with the application-specific scheduling logic * Implement a scheduler with the application-specific scheduling logic
Note that the last two can be combined: a Kubernetes controller can do the scheduling for the pods it creates, Note that the last two can be combined: a Kubernetes controller can do the
by writing node name to the pods when it creates them. scheduling for the pods it creates, by writing node name to the pods when it
creates them.
Once you've done this, you end up with an architecture that is extremely similar to the Mesos-style--the Once you've done this, you end up with an architecture that is extremely similar
Kubernetes controller is effectively a Mesos framework. The remaining differences are to the Mesos-style--the Kubernetes controller is effectively a Mesos framework.
* In Kubernetes, all API operations go through a single logical endpoint, the API server (we say logical because the API server can be replicated). The remaining differences are:
In contrast, in Mesos, API operations go to a particular framework. However, the Kubernetes API plugin model makes this difference fairly small.
* In Kubernetes, application-specific admission control, quota, defaulting, etc. rules can be implemented
in the API server rather than in the controller. Of course you can choose to make these operations be no-ops for
your application-specific collection abstractions, and handle them in your controller.
* On the node level, Mesos allows application-specific executors, whereas Kubernetes only has
executors for Docker and rkt containers.
The end-to-end flow is * In Kubernetes, all API operations go through a single logical endpoint, the
API server (we say logical because the API server can be replicated). In
contrast, in Mesos, API operations go to a particular framework. However, the
Kubernetes API plugin model makes this difference fairly small.
* In Kubernetes, application-specific admission control, quota, defaulting, etc.
rules can be implemented in the API server rather than in the controller. Of
course you can choose to make these operations be no-ops for your
application-specific collection abstractions, and handle them in your controller.
* On the node level, Mesos allows application-specific executors, whereas
Kubernetes only has executors for Docker and rkt containers.
The end-to-end flow is:
1. Client submits an application-specific collection object to the API server 1. Client submits an application-specific collection object to the API server
2. The API server plugin for that collection object forwards the request to the API server that handles that collection type
3. Admission control, quota, applying defaults, etc. runs on the collection object 2. The API server plugin for that collection object forwards the request to the
API server that handles that collection type
3. Admission control, quota, applying defaults, etc. runs on the collection
object
4. If the collection is admitted, it is persisted 4. If the collection is admitted, it is persisted
5. The collection controller sees the collection object and in response creates the underlying pods and chooses which nodes they will run on by setting node name
5. The collection controller sees the collection object and in response creates
the underlying pods and chooses which nodes they will run on by setting node
name
6. Kubelet sees the pods with node name set and starts the container(s) 6. Kubelet sees the pods with node name set and starts the container(s)
7. The collection controller manages the pods and the collection during their lifetimes
(note that if the controller and scheduler are separated, then step 5 breaks down into multiple steps: 7. The collection controller manages the pods and the collection during their
(5a) collection controller creates pods with empty node name. (5b) API server admission control, quota, defaulting, lifetimes
etc. runs on the pods; one of the admission controller steps writes the scheduler name as an annotation on each pods
(see #18262 for more details).
(5c) The corresponding application-specific scheduler chooses a node and writes node name, which triggers the Kubelet to start the pod's container(s).)
As a final note, the Kubernetes model allows multiple levels of iterative refinement of runtime abstractions, *Note: if the controller and scheduler are separated, then step 5 breaks
as long as the lowest level is the pod. For example, clients of application Foo might create a `FooSet` down into multiple steps:*
which is picked up by the FooController which in turn creates `BatchFooSet` and `ServiceFooSet` objects,
which are picked up by the BatchFoo controller and ServiceFoo controller respectively, which in turn
create pods. In between each of these steps there is an opportunity for object-specific admission control,
quota, and defaulting to run in the API server, though these can instead be handled by the controllers.
(5a) collection controller creates pods with empty node name.
(5b) API server admission control, quota, defaulting, etc. runs on the
pods; one of the admission controller steps writes the scheduler name as an
annotation on each pods (see pull request `#18262` for more details).
(5c) The corresponding application-specific scheduler chooses a node and
writes node name, which triggers the Kubelet to start the pod's container(s).
As a final note, the Kubernetes model allows multiple levels of iterative
refinement of runtime abstractions, as long as the lowest level is the pod. For
example, clients of application Foo might create a `FooSet` which is picked up
by the FooController which in turn creates `BatchFooSet` and `ServiceFooSet`
objects, which are picked up by the BatchFoo controller and ServiceFoo
controller respectively, which in turn create pods. In between each of these
steps there is an opportunity for object-specific admission control, quota, and
defaulting to run in the API server, though these can instead be handled by the
controllers.
## References ## References