![]() Background: With current design, the content backend uses key-lock for long-lived write transaction. If the content reference has been marked for write transaction, the other requestes on the same reference will fail fast with unavailable error. Since the metadata plugin is based on boltbd which only supports single-writer, the content backend can't block or handle the request too long. It requires the client to handle retry by itself, like OpenWriter - backoff retry helper. But the maximum retry interval can be up to 2 seconds. If there are several concurrent requestes fo the same image, the waiters maybe wakeup at the same time and there is only one waiter can continue. A lot of waiters will get into sleep and we will take long time to finish all the pulling jobs and be worse if the image has many more layers, which mentioned in issue #4937. After fetching, containerd.Pull API allows several hanlers to commit same ChainID snapshotter but only one can be done successfully. Since unpack tar.gz is time-consuming job, it can impact the performance on unpacking for same ChainID snapshotter in parallel. For instance, the Request 2 doesn't need to prepare and commit, it should just wait for Request 1 finish, which mentioned in pull request #6318. ```text Request 1 Request 2 Prepare | | | | Prepare Commit | | | | Commit(failed on exist) ``` Both content backoff retry and unnecessary unpack impacts the performance. Solution: Introduced the duplicate suppression in fetch and unpack context. The deplicate suppression uses key-mutex and single-waiter-notify to support singleflight. The caller can use the duplicate suppression in different PullImage handlers so that we can avoid unnecessary unpack and spin-lock in OpenWriter. Test Result: Before enhancement: ```bash ➜ /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20 crictl pull localhost:5000/redis:latest (x20) takes ... real 1m6.172s user 0m0.268s sys 0m0.193s docker pull localhost:5000/redis:latest (x20) takes ... real 0m1.324s user 0m0.441s sys 0m0.316s ➜ /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20 crictl pull localhost:5000/golang:latest (x20) takes ... real 1m47.657s user 0m0.284s sys 0m0.224s docker pull localhost:5000/golang:latest (x20) takes ... real 0m6.381s user 0m0.488s sys 0m0.358s ``` With this enhancement: ```bash ➜ /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20 crictl pull localhost:5000/redis:latest (x20) takes ... real 0m1.140s user 0m0.243s sys 0m0.178s docker pull localhost:5000/redis:latest (x20) takes ... real 0m1.239s user 0m0.463s sys 0m0.275s ➜ /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20 crictl pull localhost:5000/golang:latest (x20) takes ... real 0m5.546s user 0m0.217s sys 0m0.219s docker pull localhost:5000/golang:latest (x20) takes ... real 0m6.090s user 0m0.501s sys 0m0.331s ``` Test Script: localhost:5000/{redis|golang}:latest is equal to docker.io/library/{redis|golang}:latest. The image is hold in local registry service by `docker run -d -p 5000:5000 --name registry registry:2`. ```bash image_name="${1}" pull_times="${2:-10}" cleanup() { ctr image rmi "${image_name}" ctr -n k8s.io image rmi "${image_name}" crictl rmi "${image_name}" docker rmi "${image_name}" sleep 2 } crictl_testing() { for idx in $(seq 1 ${pull_times}); do crictl pull "${image_name}" > /dev/null 2>&1 & done wait } docker_testing() { for idx in $(seq 1 ${pull_times}); do docker pull "${image_name}" > /dev/null 2>&1 & done wait } cleanup > /dev/null 2>&1 echo 3 > /proc/sys/vm/drop_caches sleep 3 echo "crictl pull $image_name (x${pull_times}) takes ..." time crictl_testing echo echo 3 > /proc/sys/vm/drop_caches sleep 3 echo "docker pull $image_name (x${pull_times}) takes ..." time docker_testing ``` Fixes: #4937 Close: #4985 Close: #6318 Signed-off-by: Wei Fu <fuweid89@gmail.com> |
||
---|---|---|
.github | ||
.zuul/playbooks/containerd-build | ||
api | ||
archive | ||
cio | ||
cluster | ||
cmd | ||
containers | ||
content | ||
contrib | ||
defaults | ||
design | ||
diff | ||
docs | ||
errdefs | ||
events | ||
filters | ||
gc | ||
identifiers | ||
images | ||
integration | ||
labels | ||
leases | ||
log | ||
metadata | ||
metrics | ||
mount | ||
namespaces | ||
oci | ||
pkg | ||
platforms | ||
plugin | ||
protobuf | ||
reference | ||
releases | ||
remotes | ||
rootfs | ||
runtime | ||
script | ||
services | ||
snapshots | ||
sys | ||
test | ||
tracing | ||
vendor | ||
version | ||
.gitattributes | ||
.gitignore | ||
.golangci.yml | ||
.mailmap | ||
.zuul.yaml | ||
ADOPTERS.md | ||
BUILDING.md | ||
client_opts.go | ||
client.go | ||
code-of-conduct.md | ||
codecov.yml | ||
container_checkpoint_opts.go | ||
container_opts_unix.go | ||
container_opts.go | ||
container_restore_opts.go | ||
container.go | ||
containerd.service | ||
containerstore.go | ||
diff.go | ||
events.go | ||
export.go | ||
go.mod | ||
go.sum | ||
grpc.go | ||
image_store.go | ||
image.go | ||
import.go | ||
install_opts.go | ||
install.go | ||
lease.go | ||
LICENSE | ||
Makefile | ||
Makefile.darwin | ||
Makefile.freebsd | ||
Makefile.linux | ||
Makefile.windows | ||
namespaces.go | ||
NOTICE | ||
process.go | ||
Protobuild.toml | ||
pull.go | ||
README.md | ||
RELEASES.md | ||
ROADMAP.md | ||
SCOPE.md | ||
services.go | ||
signals.go | ||
snapshotter_default_linux.go | ||
snapshotter_default_unix.go | ||
snapshotter_default_windows.go | ||
snapshotter_opts_unix.go | ||
task_opts_unix.go | ||
task_opts.go | ||
task.go | ||
unpacker.go | ||
Vagrantfile |
containerd is an industry-standard container runtime with an emphasis on simplicity, robustness and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.
containerd is a member of CNCF with 'graduated' status.
containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users.
Now Recruiting
We are a large inclusive OSS project that is welcoming help of any kind shape or form:
- Documentation help is needed to make the product easier to consume and extend.
- We need OSS community outreach / organizing help to get the word out; manage and create messaging and educational content; and to help with social media, community forums/groups, and google groups.
- We are actively inviting new security advisors to join the team.
- New sub-projects are being created, core and non-core that could use additional development help.
- Each of the containerd projects has a list of issues currently being worked on or that need help resolving.
- If the issue has not already been assigned to someone, or has not made recent progress and you are interested, please inquire.
- If you are interested in starting with a smaller / beginner level issue, look for issues with an
exp/beginner
tag, for example containerd/containerd beginner issues.
Getting Started
See our documentation on containerd.io:
See how to build containerd from source at BUILDING.
If you are interested in trying out containerd see our example at Getting Started.
Nightly builds
There are nightly builds available for download here.
Binaries are generated from main
branch every night for Linux
and Windows
.
Please be aware: nightly builds might have critical bugs, it's not recommended for use in production and no support provided.
Runtime Requirements
Runtime requirements for containerd are very minimal. Most interactions with
the Linux and Windows container feature sets are handled via runc and/or
OS-specific libraries (e.g. hcsshim for Microsoft).
The current required version of runc
is described in RUNC.md.
There are specific features used by containerd core code and snapshotters that will require a minimum kernel version on Linux. With the understood caveat of distro kernel versioning, a reasonable starting point for Linux is a minimum 4.x kernel version.
The overlay filesystem snapshotter, used by default, uses features that were finalized in the 4.x kernel series. If you choose to use btrfs, there may be more flexibility in kernel version (minimum recommended is 3.18), but will require the btrfs kernel module and btrfs tools to be installed on your Linux distribution.
To use Linux checkpoint and restore features, you will need criu
installed on
your system. See more details in Checkpoint and Restore.
Build requirements for developers are listed in BUILDING.
Supported Registries
Any registry which is compliant with the OCI Distribution Specification is supported by containerd.
For configuring registries, see registry host configuration documentation
Features
Client
containerd offers a full client package to help you integrate containerd into your platform.
import (
"context"
"github.com/containerd/containerd"
"github.com/containerd/containerd/cio"
"github.com/containerd/containerd/namespaces"
)
func main() {
client, err := containerd.New("/run/containerd/containerd.sock")
defer client.Close()
}
Namespaces
Namespaces allow multiple consumers to use the same containerd without conflicting with each other. It has the benefit of sharing content but still having separation with containers and images.
To set a namespace for requests to the API:
context = context.Background()
// create a context for docker
docker = namespaces.WithNamespace(context, "docker")
containerd, err := client.NewContainer(docker, "id")
To set a default namespace on the client:
client, err := containerd.New(address, containerd.WithDefaultNamespace("docker"))
Distribution
// pull an image
image, err := client.Pull(context, "docker.io/library/redis:latest")
// push an image
err := client.Push(context, "docker.io/library/redis:latest", image.Target())
Containers
In containerd, a container is a metadata object. Resources such as an OCI runtime specification, image, root filesystem, and other metadata can be attached to a container.
redis, err := client.NewContainer(context, "redis-master")
defer redis.Delete(context)
OCI Runtime Specification
containerd fully supports the OCI runtime specification for running containers. We have built in functions to help you generate runtime specifications based on images as well as custom parameters.
You can specify options when creating a container about how to modify the specification.
redis, err := client.NewContainer(context, "redis-master", containerd.WithNewSpec(oci.WithImageConfig(image)))
Root Filesystems
containerd allows you to use overlay or snapshot filesystems with your containers. It comes with built in support for overlayfs and btrfs.
// pull an image and unpack it into the configured snapshotter
image, err := client.Pull(context, "docker.io/library/redis:latest", containerd.WithPullUnpack)
// allocate a new RW root filesystem for a container based on the image
redis, err := client.NewContainer(context, "redis-master",
containerd.WithNewSnapshot("redis-rootfs", image),
containerd.WithNewSpec(oci.WithImageConfig(image)),
)
// use a readonly filesystem with multiple containers
for i := 0; i < 10; i++ {
id := fmt.Sprintf("id-%s", i)
container, err := client.NewContainer(ctx, id,
containerd.WithNewSnapshotView(id, image),
containerd.WithNewSpec(oci.WithImageConfig(image)),
)
}
Tasks
Taking a container object and turning it into a runnable process on a system is done by creating a new Task
from the container. A task represents the runnable object within containerd.
// create a new task
task, err := redis.NewTask(context, cio.NewCreator(cio.WithStdio))
defer task.Delete(context)
// the task is now running and has a pid that can be used to setup networking
// or other runtime settings outside of containerd
pid := task.Pid()
// start the redis-server process inside the container
err := task.Start(context)
// wait for the task to exit and get the exit status
status, err := task.Wait(context)
Checkpoint and Restore
If you have criu installed on your machine you can checkpoint and restore containers and their tasks. This allows you to clone and/or live migrate containers to other machines.
// checkpoint the task then push it to a registry
checkpoint, err := task.Checkpoint(context)
err := client.Push(context, "myregistry/checkpoints/redis:master", checkpoint)
// on a new machine pull the checkpoint and restore the redis container
checkpoint, err := client.Pull(context, "myregistry/checkpoints/redis:master")
redis, err = client.NewContainer(context, "redis-master", containerd.WithNewSnapshot("redis-rootfs", checkpoint))
defer container.Delete(context)
task, err = redis.NewTask(context, cio.NewCreator(cio.WithStdio), containerd.WithTaskCheckpoint(checkpoint))
defer task.Delete(context)
err := task.Start(context)
Snapshot Plugins
In addition to the built-in Snapshot plugins in containerd, additional external plugins can be configured using GRPC. An external plugin is made available using the configured name and appears as a plugin alongside the built-in ones.
To add an external snapshot plugin, add the plugin to containerd's config file
(by default at /etc/containerd/config.toml
). The string following
proxy_plugin.
will be used as the name of the snapshotter and the address
should refer to a socket with a GRPC listener serving containerd's Snapshot
GRPC API. Remember to restart containerd for any configuration changes to take
effect.
[proxy_plugins]
[proxy_plugins.customsnapshot]
type = "snapshot"
address = "/var/run/mysnapshotter.sock"
See PLUGINS.md for how to create plugins
Releases and API Stability
Please see RELEASES.md for details on versioning and stability of containerd components.
Downloadable 64-bit Intel/AMD binaries of all official releases are available on our releases page.
For other architectures and distribution support, you will find that many Linux distributions package their own containerd and provide it across several architectures, such as Canonical's Ubuntu packaging.
Enabling command auto-completion
Starting with containerd 1.4, the urfave client feature for auto-creation of bash and zsh
autocompletion data is enabled. To use the autocomplete feature in a bash shell for example, source
the autocomplete/ctr file in your .bashrc
, or manually like:
$ source ./contrib/autocomplete/ctr
Distribution of ctr
autocomplete for bash and zsh
For bash, copy the contrib/autocomplete/ctr
script into
/etc/bash_completion.d/
and rename it to ctr
. The zsh_autocomplete
file is also available and can be used similarly for zsh users.
Provide documentation to users to source
this file into their shell if
you don't place the autocomplete file in a location where it is automatically
loaded for the user's shell environment.
CRI
cri
is a containerd plugin implementation of the Kubernetes container runtime interface (CRI). With it, you are able to use containerd as the container runtime for a Kubernetes cluster.
CRI Status
cri
is a native plugin of containerd. Since containerd 1.1, the cri plugin is built into the release binaries and enabled by default.
Note: As of containerd 1.5, the
cri
plugin is merged into the containerd/containerd repo. For example, the source code previously stored undercontainerd/cri/pkg
was moved tocontainerd/containerd/pkg/cri
package.
The cri
plugin has reached GA status, representing that it is:
- Feature complete
- Works with Kubernetes 1.10 and above
- Passes all CRI validation tests.
- Passes all node e2e tests.
- Passes all e2e tests.
See results on the containerd k8s test dashboard
Validating Your cri
Setup
A Kubernetes incubator project, cri-tools, includes programs for exercising CRI implementations. More importantly, cri-tools includes the program critest
which is used for running CRI Validation Testing.
CRI Guides
- Installing with Ansible and Kubeadm
- For Non-Ansible Users, Preforming a Custom Installation Using the Release Tarball and Kubeadm
- CRI Plugin Testing Guide
- Debugging Pods, Containers, and Images with
crictl
- Configuring
cri
Plugins - Configuring containerd
Communication
For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.
For sync communication catch us in the #containerd
and #containerd-dev
slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com
. Everyone is welcome to join and chat. Get Invite to CNCF slack.
Security audit
A third party security audit was performed by Cure53 in 4Q2018; the full report is available in our docs/ directory.
Reporting security issues
If you are reporting a security issue, please reach out discreetly at security@containerd.io.
Licenses
The containerd codebase is released under the Apache 2.0 license. The README.md file, and files in the "docs" folder are licensed under the Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.
Project details
containerd is the primary open source project within the broader containerd GitHub organization.
However, all projects within the repo have common maintainership, governance, and contributing
guidelines which are stored in a project
repository commonly for all containerd projects.
Please find all these core project documents, including the:
information in our containerd/project
repository.
Adoption
Interested to see who is using containerd? Are you using containerd in a project? Please add yourself via pull request to our ADOPTERS.md file.