![]() To support multi-tenancy, containerd allows the collection of metadata and runtime objects within a heirarchical storage primitive known as namespaces. Data cannot be shared across these namespaces, unless allowed by the service. This allows multiple sets of containers to managed without interaction between the clients that management. This means that different users, such as SwarmKit, K8s, Docker and others can use containerd without coordination. Through labels, one may use namespaces as a tool for cleanly organizing the use of containerd containers, including the metadata storage for higher level features, such as ACLs. Namespaces Namespaces cross-cut all containerd operations and are communicated via context, either within the Go context or via GRPC headers. As a general rule, no features are tied to namespace, other than organization. This will be maintained into the future. They are created as a side-effect of operating on them or may be created manually. Namespaces can be labeled for organization. They cannot be deleted unless the namespace is empty, although we may want to make it so one can clean up the entirety of containerd by deleting a namespace. Most users will interface with namespaces by setting in the context or via the `CONTAINERD_NAMESPACE` environment variable, but the experience is mostly left to the client. For `ctr` and `dist`, we have defined a "default" namespace that will be created up on use, but there is nothing special about it. As part of this PR we have plumbed this behavior through all commands, cleaning up context management along the way. Namespaces in Action Namespaces can be managed with the `ctr namespaces` subcommand. They can be created, labeled and destroyed. A few commands can demonstrate the power of namespaces for use with images. First, lets create a namespace: ``` $ ctr namespaces create foo mylabel=bar $ ctr namespaces ls NAME LABELS foo mylabel=bar ``` We can see that we have a namespace `foo` and it has a label. Let's pull an image: ``` $ dist pull docker.io/library/redis:latest docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++| manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++| config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++| elapsed: 0.9 s total: 0.0 B (0.0 B/s) INFO[0000] unpacking rootfs INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51 ``` Now, let's list the image: ``` $ dist images ls REF TYPE DIGEST SIZE docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB ``` That looks normal. Let's list the images for the `foo` namespace and see this in action: ``` $ CONTAINERD_NAMESPACE=foo dist images ls REF TYPE DIGEST SIZE ``` Look at that! Nothing was pulled in the namespace `foo`. Let's do the same pull: ``` $ CONTAINERD_NAMESPACE=foo dist pull docker.io/library/redis:latest docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++| manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++| config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++| elapsed: 0.8 s total: 0.0 B (0.0 B/s) INFO[0000] unpacking rootfs INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51 ``` Wow, that was very snappy! Looks like we pulled that image into out namespace but didn't have to download any new data because we are sharing storage. Let's take a peak at the images we have in `foo`: ``` $ CONTAINERD_NAMESPACE=foo dist images ls REF TYPE DIGEST SIZE docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB ``` Now, let's remove that image from `foo`: ``` $ CONTAINERD_NAMESPACE=foo dist images rm docker.io/library/redis:latest ``` Looks like it is gone: ``` $ CONTAINERD_NAMESPACE=foo dist images ls REF TYPE DIGEST SIZE ``` But, as we can see, it is present in the `default` namespace: ``` $ dist images ls REF TYPE DIGEST SIZE docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB ``` What happened here? We can tell by listing the namespaces to get a better understanding: ``` $ ctr namespaces ls NAME LABELS default foo mylabel=bar ``` From the above, we can see that the `default` namespace was created with the standard commands without the environment variable set. Isolating the set of shared images while sharing the data that matters. Since we removed the images for namespace `foo`, we can remove it now: ``` $ ctr namespaces rm foo foo ``` However, when we try to remove the `default` namespace, we get an error: ``` $ ctr namespaces rm default ctr: unable to delete default: rpc error: code = FailedPrecondition desc = namespace default must be empty ``` This is because we require that namespaces be empty when removed. Caveats - While most metadata objects are namespaced, containers and tasks may exhibit some issues. We still need to move runtimes to namespaces and the container metadata storage may not be fully worked out. - Still need to migrate content store to metadata storage and namespace the content store such that some data storage (ie images). - Specifics of snapshot driver's relation to namespace needs to be worked out in detail. Signed-off-by: Stephen J Day <stephen.day@docker.com> |
||
---|---|---|
api | ||
archive | ||
cmd | ||
containers | ||
content | ||
design | ||
differ | ||
docs | ||
events | ||
fs | ||
gc | ||
images | ||
linux | ||
log | ||
metadata | ||
metrics/cgroups | ||
mount | ||
namespaces | ||
plugin | ||
progress | ||
reaper | ||
reference | ||
remotes | ||
reports | ||
rootfs | ||
services | ||
snapshot | ||
sys | ||
testutil | ||
vendor | ||
version | ||
windows | ||
.gitignore | ||
.travis.yml | ||
BUILDING.md | ||
checkpoint_test.go | ||
client_test.go | ||
client_unix.go | ||
client_windows.go | ||
client.go | ||
code-of-conduct.md | ||
container_linux.go | ||
container_test.go | ||
container.go | ||
CONTRIBUTING.md | ||
image.go | ||
io_unix.go | ||
io_windows.go | ||
io.go | ||
LICENSE.code | ||
LICENSE.docs | ||
MAINTAINERS | ||
Makefile | ||
NOTICE | ||
process.go | ||
README.md | ||
ROADMAP.md | ||
RUNC.md | ||
spec_unix_test.go | ||
spec_unix.go | ||
spec_windows.go | ||
spec.go | ||
task.go | ||
vendor.conf |
containerd is an industry-standard container runtime with an emphasis on simplicity, robustness and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc..
containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users.
State of the Project
containerd currently has two active branches.
There is a v0.2.x branch for the current release of containerd that is being consumed by Docker and others and the master branch is the development branch for the 1.0 roadmap and feature set.
Any PR or issue that is intended for the current v0.2.x release should be tagged with the same v0.2.x
tag.
Communication
For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.
For sync communication we have a community slack with a #containerd channel that everyone is welcome to join and chat about development.
Slack: https://dockr.ly/community
Developer Quick-Start
To build the daemon and ctr
simple test client, the following build system dependencies are required:
- Go 1.8.x or above (requires 1.8 due to use of golang plugin(s))
- Protoc 3.x compiler and headers (download at the Google protobuf releases page)
- Btrfs headers and libraries for your distribution. Note that building the btrfs driver can be disabled via build tag removing this dependency.
For proper results, install the protoc
release into /usr/local
on your build system. For example, the following commands will download and install the 3.1.0 release for a 64-bit Linux host:
$ wget -c https://github.com/google/protobuf/releases/download/v3.1.0/protoc-3.1.0-linux-x86_64.zip
$ sudo unzip protoc-3.1.0-linux-x86_64.zip -d /usr/local
With the required dependencies installed, the Makefile
target named binaries will compile the ctr
and containerd
binaries and place them in the bin/
directory. Using sudo make install
will place the binaries in /usr/local/bin
. When making any changes to the gRPC API, make generate
will use the installed protoc
compiler to regenerate the API generated code packages.
Note
: A build tag is currently available to disable building the btrfs snapshot driver. Adding
BUILDTAGS=no_btrfs
to your environment before calling the binaries Makefile target will disable the btrfs driver within the containerd Go build.
Vendoring of external imports uses the vndr
tool which uses a simple config file, vendor.conf
, to provide the URL and version or hash details for each vendored import. After modifying vendor.conf
run the vndr
tool to update the vendor/
directory contents. Combining the vendor.conf
update with the changeset in vendor/
after running vndr
should become a single commit for a PR which relies on vendored updates.
Please refer to RUNC.md for the currently supported version of runc
that is used by containerd.
Features
- OCI Image Spec support
- OCI Runtime Spec support
- Image push and pull support
- Container runtime and lifecycle support
- Management of network namespaces containers to join existing namespaces
- Multi-tenant supported with CAS storage for global images
Scope and Principles
Having a clearly defined scope of a project is important for ensuring consistency and focus. These following criteria will be used when reviewing pull requests, features, and changes for the project before being accepted.
Components
Components should not have tight dependencies on each other so that they are able to be used independently. The APIs for images and containers should be designed in a way that when used together the components have a natural flow but still be useful independently.
An example for this design can be seen with the overlay filesystems and the container execution layer.
The execution layer and overlay filesystems can be used independently but if you were to use both, they share a common Mount
struct that the filesystems produce and the execution layer consumes.
Primitives
containerd should expose primitives to solve problems instead of building high level abstractions in the API. A common example of this is how build would be implemented. Instead of having a build API in containerd we should expose the lower level primitives that allow things required in build to work. Breaking up the filesystem APIs to allow snapshots, copy functionality, and mounts allow people implementing build at the higher levels more flexibility.
Extensibility and Defaults
For the various components in containerd there should be defined extension points where implementations can be swapped for alternatives.
The best example of this is that containerd will use runc
from OCI as the default runtime in the execution layer but other runtimes conforming to the OCI Runtime specification they can be easily added to containerd.
containerd will come with a default implementation for the various components. These defaults will be chosen by the maintainers of the project and should not change unless better tech for that component comes out. Additional implementations will not be accepted into the core repository and should be developed in a separate repository not maintained by the containerd maintainers.
Releases
containerd will be released with a 1.0 when feature complete and this version will be supported for 1 year with security and bug fixes applied and released.
The upgrade path for containerd is that the 0.0.x patch releases are always backward compatible with its major and minor version. Minor (0.x.0) version will always be compatible with the previous minor release. i.e. 1.2.0 is backwards compatible with 1.1.0 and 1.1.0 is compatible with 1.0.0. There is no compatibility guarantees with upgrades from two minor releases. i.e. 1.0.0 to 1.2.0.
There are not backwards compatibility guarantees with upgrades to major versions. i.e 1.0.0 to 2.0.0. Each major version will be supported for 1 year with bug fixes and security patches.
Scope
The following table specifies the various components of containerd and general features of container runtimes. The table specifies whether or not the feature/component is in or out of scope.
Name | Description | In/Out | Reason |
---|---|---|---|
execution | Provide an extensible execution layer for executing a container | in | Create,start, stop pause, resume exec, signal, delete |
cow filesystem | Built in functionality for overlay, aufs, and other copy on write filesystems for containers | in | |
distribution | Having the ability to push and pull images as well as operations on images as a first class API object | in | containerd will fully support the management and retrieval of images |
metrics | container-level metrics, cgroup stats, and OOM events | in | |
networking | creation and management of network interfaces | out | Networking will be handled and provided to containerd via higher level systems. |
build | Building images as a first class API | out | Build is a higher level tooling feature and can be implemented in many different ways on top of containerd |
volumes | Volume management for external data | out | The API supports mounts, binds, etc where all volumes type systems can be built on top of containerd. |
logging | Persisting container logs | out | Logging can be build on top of containerd because the container’s STDIO will be provided to the clients and they can persist any way they see fit. There is no io copying of container STDIO in containerd. |
containerd is scoped to a single host and makes assumptions based on that fact. It can be used to build things like a node agent that launches containers but does not have any concepts of a distributed system.
containerd is designed to be embedded into a larger system, hence it only includes a barebone CLI (ctr
) specifically for development and debugging purpose, with no mandate to be human-friendly, and no guarantee of interface stability over time.
Also things like service discovery are out of scope even though networking is in scope. containerd should provide the primitives to create, add, remove, or manage network interfaces and network namespaces for a container but IP allocation, discovery, and DNS should be handled at higher layers.
How is the scope changed?
The scope of this project is a whitelist.
If it's not mentioned as being in scope, it is out of scope.
For the scope of this project to change it requires a 100% vote from all maintainers of the project.
Development reports.
Weekly summary on the progress and what is being worked on. https://github.com/containerd/containerd/tree/master/reports
Copyright and license
Copyright © 2016 Docker, Inc. All rights reserved, except as follows. Code is released under the Apache 2.0 license. The README.md file, and files in the "docs" folder are licensed under the Creative Commons Attribution 4.0 International License under the terms and conditions set forth in the file "LICENSE.docs". You may obtain a duplicate copy of the same license, titled CC-BY-SA-4.0, at http://creativecommons.org/licenses/by/4.0/.