containerd

Go to file

Wei Fu 8113758568 CRI: improve image pulling performance

Background:

With current design, the content backend uses key-lock for long-lived
write transaction. If the content reference has been marked for write
transaction, the other requestes on the same reference will fail fast with
unavailable error. Since the metadata plugin is based on boltbd which
only supports single-writer, the content backend can't block or handle
the request too long. It requires the client to handle retry by itself,
like OpenWriter - backoff retry helper. But the maximum retry interval
can be up to 2 seconds. If there are several concurrent requestes fo the
same image, the waiters maybe wakeup at the same time and there is only
one waiter can continue. A lot of waiters will get into sleep and we will
take long time to finish all the pulling jobs and be worse if the image
has many more layers, which mentioned in issue #4937.

After fetching, containerd.Pull API allows several hanlers to commit
same ChainID snapshotter but only one can be done successfully. Since
unpack tar.gz is time-consuming job, it can impact the performance on
unpacking for same ChainID snapshotter in parallel.

For instance, the Request 2 doesn't need to prepare and commit, it
should just wait for Request 1 finish, which mentioned in pull
request #6318.

```text
	Request 1	Request 2

	Prepare
	   |
	   |
	   |
	   |		Prepare
	Commit		   |
			   |
			   |
			   |
			Commit(failed on exist)
```

Both content backoff retry and unnecessary unpack impacts the performance.

Solution:

Introduced the duplicate suppression in fetch and unpack context. The
deplicate suppression uses key-mutex and single-waiter-notify to support
singleflight. The caller can use the duplicate suppression in different
PullImage handlers so that we can avoid unnecessary unpack and spin-lock
in OpenWriter.

Test Result:

Before enhancement:

```bash
➜  /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20
crictl pull localhost:5000/redis:latest (x20) takes ...

real	1m6.172s
user	0m0.268s
sys	0m0.193s

docker pull localhost:5000/redis:latest (x20) takes ...

real	0m1.324s
user	0m0.441s
sys	0m0.316s

➜  /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20
crictl pull localhost:5000/golang:latest (x20) takes ...

real	1m47.657s
user	0m0.284s
sys	0m0.224s

docker pull localhost:5000/golang:latest (x20) takes ...

real	0m6.381s
user	0m0.488s
sys	0m0.358s
```

With this enhancement:

```bash
➜  /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20
crictl pull localhost:5000/redis:latest (x20) takes ...

real	0m1.140s
user	0m0.243s
sys	0m0.178s

docker pull localhost:5000/redis:latest (x20) takes ...

real	0m1.239s
user	0m0.463s
sys	0m0.275s

➜  /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20
crictl pull localhost:5000/golang:latest (x20) takes ...

real	0m5.546s
user	0m0.217s
sys	0m0.219s

docker pull localhost:5000/golang:latest (x20) takes ...

real	0m6.090s
user	0m0.501s
sys	0m0.331s
```

Test Script:

localhost:5000/{redis|golang}:latest is equal to
docker.io/library/{redis|golang}:latest. The image is hold in local registry
service by `docker run -d -p 5000:5000 --name registry registry:2`.

```bash

image_name="${1}"
pull_times="${2:-10}"

cleanup() {
  ctr image rmi "${image_name}"
  ctr -n k8s.io image rmi "${image_name}"
  crictl rmi "${image_name}"
  docker rmi "${image_name}"
  sleep 2
}

crictl_testing() {
  for idx in $(seq 1 ${pull_times}); do
    crictl pull "${image_name}" > /dev/null 2>&1 &
  done
  wait
}

docker_testing() {
  for idx in $(seq 1 ${pull_times}); do
    docker pull "${image_name}" > /dev/null 2>&1 &
  done
  wait
}

cleanup > /dev/null 2>&1

echo 3 > /proc/sys/vm/drop_caches
sleep 3
echo "crictl pull $image_name (x${pull_times}) takes ..."
time crictl_testing
echo

echo 3 > /proc/sys/vm/drop_caches
sleep 3
echo "docker pull $image_name (x${pull_times}) takes ..."
time docker_testing
```

Fixes: #4937
Close: #4985
Close: #6318

Signed-off-by: Wei Fu <fuweid89@gmail.com>

2022-04-06 07:14:18 +08:00

.github

Address some timeout issues in the Windows CI

2022-04-01 14:02:20 +03:00

.zuul/playbooks/containerd-build

Use Go 1.18 to build and test containerd

2022-03-18 16:48:25 +00:00

api

Remove enumvalue_customname, goproto_enum_prefix and enum_customname

2022-03-21 19:48:16 +00:00

README.md

containerd is an industry-standard container runtime with an emphasis on simplicity, robustness and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.

containerd is a member of CNCF with 'graduated' status.

containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users.

Now Recruiting

We are a large inclusive OSS project that is welcoming help of any kind shape or form:

Documentation help is needed to make the product easier to consume and extend.
We need OSS community outreach / organizing help to get the word out; manage and create messaging and educational content; and to help with social media, community forums/groups, and google groups.
We are actively inviting new security advisors to join the team.
New sub-projects are being created, core and non-core that could use additional development help.
Each of the containerd projects has a list of issues currently being worked on or that need help resolving.
- If the issue has not already been assigned to someone, or has not made recent progress and you are interested, please inquire.
- If you are interested in starting with a smaller / beginner level issue, look for issues with an exp/beginner tag, for example containerd/containerd beginner issues.

Getting Started

See our documentation on containerd.io:

See how to build containerd from source at BUILDING.

If you are interested in trying out containerd see our example at Getting Started.

Nightly builds

There are nightly builds available for download here. Binaries are generated from main branch every night for Linux and Windows.

Please be aware: nightly builds might have critical bugs, it's not recommended for use in production and no support provided.

Runtime Requirements

Runtime requirements for containerd are very minimal. Most interactions with the Linux and Windows container feature sets are handled via runc and/or OS-specific libraries (e.g. hcsshim for Microsoft). The current required version of runc is described in RUNC.md.

There are specific features used by containerd core code and snapshotters that will require a minimum kernel version on Linux. With the understood caveat of distro kernel versioning, a reasonable starting point for Linux is a minimum 4.x kernel version.

The overlay filesystem snapshotter, used by default, uses features that were finalized in the 4.x kernel series. If you choose to use btrfs, there may be more flexibility in kernel version (minimum recommended is 3.18), but will require the btrfs kernel module and btrfs tools to be installed on your Linux distribution.

To use Linux checkpoint and restore features, you will need criu installed on your system. See more details in Checkpoint and Restore.

Build requirements for developers are listed in BUILDING.

Supported Registries

Any registry which is compliant with the OCI Distribution Specification is supported by containerd.

For configuring registries, see registry host configuration documentation

Features

Client

containerd offers a full client package to help you integrate containerd into your platform.


import (
  "context"

  "github.com/containerd/containerd"
  "github.com/containerd/containerd/cio"
  "github.com/containerd/containerd/namespaces"
)


func main() {
	client, err := containerd.New("/run/containerd/containerd.sock")
	defer client.Close()
}

Namespaces

Namespaces allow multiple consumers to use the same containerd without conflicting with each other. It has the benefit of sharing content but still having separation with containers and images.

To set a namespace for requests to the API:

context = context.Background()
// create a context for docker
docker = namespaces.WithNamespace(context, "docker")

containerd, err := client.NewContainer(docker, "id")

To set a default namespace on the client:

client, err := containerd.New(address, containerd.WithDefaultNamespace("docker"))

Distribution

// pull an image
image, err := client.Pull(context, "docker.io/library/redis:latest")

// push an image
err := client.Push(context, "docker.io/library/redis:latest", image.Target())

Containers

In containerd, a container is a metadata object. Resources such as an OCI runtime specification, image, root filesystem, and other metadata can be attached to a container.

redis, err := client.NewContainer(context, "redis-master")
defer redis.Delete(context)

OCI Runtime Specification

containerd fully supports the OCI runtime specification for running containers. We have built in functions to help you generate runtime specifications based on images as well as custom parameters.

You can specify options when creating a container about how to modify the specification.

redis, err := client.NewContainer(context, "redis-master", containerd.WithNewSpec(oci.WithImageConfig(image)))

Root Filesystems

containerd allows you to use overlay or snapshot filesystems with your containers. It comes with built in support for overlayfs and btrfs.

// pull an image and unpack it into the configured snapshotter
image, err := client.Pull(context, "docker.io/library/redis:latest", containerd.WithPullUnpack)

// allocate a new RW root filesystem for a container based on the image
redis, err := client.NewContainer(context, "redis-master",
	containerd.WithNewSnapshot("redis-rootfs", image),
	containerd.WithNewSpec(oci.WithImageConfig(image)),
)

// use a readonly filesystem with multiple containers
for i := 0; i < 10; i++ {
	id := fmt.Sprintf("id-%s", i)
	container, err := client.NewContainer(ctx, id,
		containerd.WithNewSnapshotView(id, image),
		containerd.WithNewSpec(oci.WithImageConfig(image)),
	)
}

Tasks

Taking a container object and turning it into a runnable process on a system is done by creating a new Task from the container. A task represents the runnable object within containerd.

// create a new task
task, err := redis.NewTask(context, cio.NewCreator(cio.WithStdio))
defer task.Delete(context)

// the task is now running and has a pid that can be used to setup networking
// or other runtime settings outside of containerd
pid := task.Pid()

// start the redis-server process inside the container
err := task.Start(context)

// wait for the task to exit and get the exit status
status, err := task.Wait(context)

Checkpoint and Restore

If you have criu installed on your machine you can checkpoint and restore containers and their tasks. This allows you to clone and/or live migrate containers to other machines.

// checkpoint the task then push it to a registry
checkpoint, err := task.Checkpoint(context)

err := client.Push(context, "myregistry/checkpoints/redis:master", checkpoint)

// on a new machine pull the checkpoint and restore the redis container
checkpoint, err := client.Pull(context, "myregistry/checkpoints/redis:master")

redis, err = client.NewContainer(context, "redis-master", containerd.WithNewSnapshot("redis-rootfs", checkpoint))
defer container.Delete(context)

task, err = redis.NewTask(context, cio.NewCreator(cio.WithStdio), containerd.WithTaskCheckpoint(checkpoint))
defer task.Delete(context)

err := task.Start(context)

Snapshot Plugins

In addition to the built-in Snapshot plugins in containerd, additional external plugins can be configured using GRPC. An external plugin is made available using the configured name and appears as a plugin alongside the built-in ones.

To add an external snapshot plugin, add the plugin to containerd's config file (by default at /etc/containerd/config.toml). The string following proxy_plugin. will be used as the name of the snapshotter and the address should refer to a socket with a GRPC listener serving containerd's Snapshot GRPC API. Remember to restart containerd for any configuration changes to take effect.

[proxy_plugins]
  [proxy_plugins.customsnapshot]
    type = "snapshot"
    address =  "/var/run/mysnapshotter.sock"

See PLUGINS.md for how to create plugins

Releases and API Stability

Please see RELEASES.md for details on versioning and stability of containerd components.

Downloadable 64-bit Intel/AMD binaries of all official releases are available on our releases page.

For other architectures and distribution support, you will find that many Linux distributions package their own containerd and provide it across several architectures, such as Canonical's Ubuntu packaging.

Enabling command auto-completion

Starting with containerd 1.4, the urfave client feature for auto-creation of bash and zsh autocompletion data is enabled. To use the autocomplete feature in a bash shell for example, source the autocomplete/ctr file in your .bashrc, or manually like:

$ source ./contrib/autocomplete/ctr

Distribution of `ctr` autocomplete for bash and zsh

For bash, copy the contrib/autocomplete/ctr script into /etc/bash_completion.d/ and rename it to ctr. The zsh_autocomplete file is also available and can be used similarly for zsh users.

Provide documentation to users to source this file into their shell if you don't place the autocomplete file in a location where it is automatically loaded for the user's shell environment.

CRI

cri is a containerd plugin implementation of the Kubernetes container runtime interface (CRI). With it, you are able to use containerd as the container runtime for a Kubernetes cluster.

CRI Status

cri is a native plugin of containerd. Since containerd 1.1, the cri plugin is built into the release binaries and enabled by default.

Note: As of containerd 1.5, the cri plugin is merged into the containerd/containerd repo. For example, the source code previously stored under containerd/cri/pkg was moved to containerd/containerd/pkg/cri package.

The cri plugin has reached GA status, representing that it is:

Feature complete
Works with Kubernetes 1.10 and above
Passes all CRI validation tests.
Passes all node e2e tests.
Passes all e2e tests.

See results on the containerd k8s test dashboard

Validating Your `cri` Setup

A Kubernetes incubator project, cri-tools, includes programs for exercising CRI implementations. More importantly, cri-tools includes the program critest which is used for running CRI Validation Testing.

CRI Guides

Communication

For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.

For sync communication catch us in the #containerd and #containerd-dev slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF slack.

Security audit

A third party security audit was performed by Cure53 in 4Q2018; the full report is available in our docs/ directory.

Reporting security issues

If you are reporting a security issue, please reach out discreetly at security@containerd.io.

Licenses

The containerd codebase is released under the Apache 2.0 license. The README.md file, and files in the "docs" folder are licensed under the Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.

Project details

containerd is the primary open source project within the broader containerd GitHub organization. However, all projects within the repo have common maintainership, governance, and contributing guidelines which are stored in a project repository commonly for all containerd projects.

Please find all these core project documents, including the:

information in our containerd/project repository.

Adoption

Interested to see who is using containerd? Are you using containerd in a project? Please add yourself via pull request to our ADOPTERS.md file.

Releases 2

Add CDI volume support Latest

2025-06-10 03:27:44 +00:00

Languages

Go 98%

Shell 1.4%

Makefile 0.4%