diff --git a/docs/PLUGINS.md b/docs/PLUGINS.md index 4aedf1a87..9c088294d 100644 --- a/docs/PLUGINS.md +++ b/docs/PLUGINS.md @@ -27,12 +27,24 @@ containerd allows extensions through two methods: ### V2 Runtimes -The runtime v2 interface allows resolving runtimes to binaries on the system. -These binaries are used to start the shim process for containerd and allows -containerd to manage those containers using the runtime shim api returned by +containerd supports multiple container runtimes. Each container can be +invoked with a different runtime. + +When using the Container Runtime Interface (CRI) plugin, named runtimes can be defined +in the containerd configuration file. When a container is run without specifying a runtime, +the configured default runtime is used. Alternatively, a different named runtime can be +specified explicitly when creating a container via CRI gRPC by selecting the runtime handler to be used. + +When a client such as `ctr` or `nerdctl` creates a container, it can optionally specify a runtime and options to use. +If a runtime is not specified, containerd will use its default runtime. + +containerd invokes v2 runtimes as binaries on the system, +which are used to start the shim process for containerd. This, in turn, allows +containerd to start and manage those containers using the runtime shim api returned by the binary. -See [runtime v2 documentation](../runtime/v2/README.md) +For more details on runtimes and shims, including how to invoke and configure them, +see the [runtime v2 documentation](../runtime/v2/README.md) ### Proxy Plugins diff --git a/docs/man/containerd-config.toml.5.md b/docs/man/containerd-config.toml.5.md index b3602c043..6293702c5 100644 --- a/docs/man/containerd-config.toml.5.md +++ b/docs/man/containerd-config.toml.5.md @@ -110,6 +110,15 @@ documentation. for cache and memory bandwidth management. See [RDT configuration](https://github.com/intel/goresctrl/blob/main/doc/rdt.md#configuration) for details of the file format. +- **[plugins."io.containerd.grpc.v1.cri".containerd]** contains options for the CRI plugin, and child nodes for CRI options: + - **default_runtime_name** (Default: **"runc"**) specifies the default runtime name +- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]** one or more container runtimes, each with a unique name +- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.]** a runtime named `` +- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes..options]** options for the named ``, most important: + - **BinaryName** specifies the path to the actual runtime to be invoked by the shim, e.g. `"/usr/bin/runc"` + + + **oom_score** : The out of memory (OOM) score applied to the containerd daemon process (Default: 0) @@ -151,7 +160,9 @@ the main config. - **path** (Default: "") Path or name of the binary - **args** (Default: "[]") Args to the binary -## EXAMPLE +## EXAMPLES + +### Complete Configuration The following is a complete **config.toml** default configuration example: @@ -200,6 +211,52 @@ imports = ["/etc/containerd/runtime_*.toml", "./debug.toml"] rdt_config_file = "" ``` +### Multiple Runtimes + +The following is an example partial configuraton with two runtimes: + +```toml +[plugins] + + [plugins."io.containerd.grpc.v1.cri"] + + [plugins."io.containerd.grpc.v1.cri".containerd] + default_runtime_name = "runc" + + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] + privileged_without_host_devices = false + runtime_type = "io.containerd.runc.v2" + + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] + BinaryName = "/usr/bin/runc" + + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.other] + privileged_without_host_devices = false + runtime_type = "io.containerd.runc.v2" + + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.other.options] + BinaryName = "/usr/bin/path-to-runtime" +``` + +The above creates two named runtime configurations - named `runc` and `other` - and sets the default runtime to `runc`. +The above are used _solely_ for runtimes invoked via CRI. To use the non-default "other" runtime in this example, +a spec will include the runtime handler named "other" to specify the desire to use the named runtime config. + +The CRI specification includes a [`runtime_handler` field](https://github.com/kubernetes/cri-api/blob/de5f1318aede866435308f39cb432618a15f104e/pkg/apis/runtime/v1/api.proto#L476), which will reference the named runtime. + +It is important to note the naming convention. Runtimes are under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]`, +with each runtime given a unique name, e.g. `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]`. +In addition, each runtime can have shim-specific options under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes..options]`, +for example, `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]`. + +The `io.containerd.runc.v2` runtime is used to run OCI-compatible runtimes on Linux, such as runc. In the example above, the `runtime_type` +field specifies the shim to use (`io.containerd.runc.v2`) while the `BinaryName` field is a shim-specific option which specifies the path to the +OCI runtime. + +For the example configuration named "runc", the shim will launch `/usr/bin/runc` as the OCI runtime. For the example configuration named +"other", the shim will launch `/usr/bin/path-to-runtime` instead. + ## BUGS Please file any specific issues that you encounter at diff --git a/runtime/v2/README.md b/runtime/v2/README.md index 994b06a9d..a81e3b37d 100644 --- a/runtime/v2/README.md +++ b/runtime/v2/README.md @@ -1,30 +1,183 @@ # Runtime v2 Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd. -The shim API is minimal and scoped to the execution lifecycle of a container. -## Binary Naming +containerd, the daemon, does not directly launch containers. Instead, it acts as a higher-level manager +or hub for coordinating the activities of containers and content, that lower-level +programs, called "runtimes", actually implement to start, stop and manage containers, +either individual containers or groups of containers, e.g. Kubernetes pods. + +For example, containerd will retrieve container image config and its content as layers, use the snapshotter to lay it out on disk, set up +the container's rootfs and config, and then launch a runtime that will create/start/stop the container. + +This document describes the major components of the v2 runtime integration model, how the components interact +with containerd and the v2 runtime, and how to use and integrate different v2 runtimes. + +To simplify the interaction, runtime v2 introduced a first class v2 API for runtime authors to integrate with containerd, +replacing the v1 API. +The v2 API is minimal and scoped to the execution lifecycle of a container. + +This document is split into the following sections: + +* [architecture](#architecture) - the major components, their purposes and relationships +* [usage](#usage) - how to invoke specific runtimes, and how to configure them +* [authoring](#shim-authoring) - how to author a v2 runtime + +## Architecture + +### containerd-runtime communication + +containerd expects a runtime to implement several container control features, such as create, start and stop. + +The high-level flow is as follows: + +1. client requests from containerd to create a container +1. containerd lays out the container's filesystem, and creates the necessary config information +1. containerd invokes the runtime over an API to create/start/stop the container + +However, containerd itself does not actually directly invoke the runtime to start the container. +Instead it expects to invoke the runtime, which will expose a socket - Unix-domain on Unix-like systems, named pipe on Windows - +and listen for container commands via [ttRPC](https://github.com/containerd/ttrpc) over that +socket. + +The runtime is expected to process those operations. How it does so is entirely within the scope of the runtime implementation. +Two common patterns are: + +* a single binary for runtime that both listens on the socket and creates/starts/stops the container +* a separate shim binary that listens on the socket, and invokes a separate runtime engine that creates/starts/stops the container + +The separate "shim+engine" pattern is used because it makes it easier to integrate distinct runtimes implementing a specific runtime +engine spec, such as the [OCI runtime spec](https://github.com/opencontainers/runtime-spec). +The ttRPC protocol can be handled via one runtime shim, while distinct runtime engine implementations can +be used, as long as they implement the OCI runtime spec. + +The most commonly used runtime _engine_ is [runc](https://github.com/opencontainers/runc), which implements the +[OCI runtime spec](https://github.com/opencontainers/runtime-spec). As this is a runtime _engine_, it is not +invoked directly by containerd; instead, it is invoked by a shim, which listens on the socket and invokes the runtime engine. + +#### shim+engine Architecture + +##### runtime shim + +The runtime shim is what actually is invoked by containerd. It has minimal options on start beyond +being provided the communications port for containerd and some configuration information. + +The runtime shim listens on the socket for ttRPC commands from containerd, and then invokes a separate program, +the runtime engine, via `fork`/`exec` to run the container. For example, the `io.containerd.runc.v2` shim invokes +an OCI compliant runtime engine such as `runc`. + +containerd passes options to the shim over the ttRPC connection, which may include the runtime engine binary +to invoke. These are the `options` for the [`CreateTaskRequest`](https://github.com/containerd/containerd/blob/main/runtime/v2/README.md#container-level-shim-configuration). + +For example, the `io.containerd.runc.v2` shim supports including the path to the runtime engine binary. + +##### runtime engine + +The runtime engine itself is what actually starts and stops the container. + +For example, in the case of [runc](https://github.com/opencontainers/runc), the containerd project provides the shim +as the executable `containerd-shim-runc-v2`. This is invoked by containerd and starts the ttRPC listener. + +The shim then invokes the actual `runc` binary, passing it the container configuration, and the `runc` binary +creates/starts/stops the container typically via `libcontainer`->system apis. + +#### shim+engine Relationship + +Since each shim instance communicates with containerd as a daemon, while parenting containers via invoking independent runtimes, +it is possible to have one shim for multiple containers and invocations. For example, +you could have one `containerd-shim-runc-v2` communicating with one containerd, and it can +invoke ten distinct containers. + +It even is possible to have one shim for multiple containers, each with its own actual runtime, +since, as described above, the runtime binary is passed as one of the options in `CreateTaskRequest`. + +containerd does not know or care about whether the shim to container relationship is one-to-one, +or one-to-many. It is entirely up to the shim to decide. For example, the `io.containerd.runc.v2` shim +automatically groups based on the presence of +[labels](https://github.com/containerd/containerd/blob/b30e0163ac36c1a193604e5eca031053d62019c5/runtime/v2/runc/manager/manager_linux.go#L54-L60). In practice, this means that containers launched by Kubernetes, that are part of the same Kubernetes pod, are handled by a single +shim, grouping on the `io.kubernetes.cri.sandbox-id` label set by the CRI plugin. + +The flow, then, is as follows: + +1. containerd receives a request to create a container +1. containerd lays out the container's filesystem, and creates the necessary [container config](https://github.com/opencontainers/image-spec/blob/main/config.md) information +1. containerd invokes the shim, including container configuration, which uses that information to decide whether to launch a new socket listener (1:1 shim to container) or use an existing one (1:many) + * if existing, return the address of the existing socket and exit + * if new, the shim: + 1. creates a new process to listen on a socket for ttRPC commands from containerd + 1. returns the address to that socket to containerd + 1. exits +1. containerd sends the shim a command to start the container +1. The shim invokes `runc` to create/start/stop the container + +An excellent flow diagram is available later in this document under [Flow](#Flow). + +## Usage + +### Invoking Runtimes + +A runtime - single instance or shim+engine - and its options, can be selected when creating a container via one of the exposed +containerd services (containerd client, CRI API,...), or via a client that calls into the containerd provided services. +Examples of containerd clients include `ctr`, `nerdctl`, kubernetes, docker/moby, rancher and others. -Users specify the runtime they wish to use when creating a container. The runtime can also be changed via a container update. -```bash -> ctr run --runtime io.containerd.runc.v2 -``` +The runtime name that is passed is a string that is used to identify the runtime to containerd. In the case of separate shim+engine, +this will be the runtime _shim_. Either way, this is the binary that containerd executes and expects to start the ttRPC listener. +The runtime name can be either a URI-like string, or, beginning with containerd 1.6.0, the actual path to the executable. -When a user specifies a runtime name, `io.containerd.runc.v2`, they will specify the name and version of the runtime. -This will be translated by containerd into a binary name for the shim. +1. If the runtime name is a path, use that as the actual path to the runtime to invoke. +1. If the runtime name is URI-like, convert it to a runtime name using the below logic. -`io.containerd.runc.v2` -> `containerd-shim-runc-v2` +If the runtime name is URI-like, containerd will convert the passed runtime from the URI-like name to a binary name using the following logic: -Since 1.6 release, it's also possible to specify absolute runtime path: +1. Replaces all `.` with `-` +1. Takes the last 2 components, e.g. `runc.v2` +1. Prepends `containerd-shim` -```bash -> ctr run --runtime /usr/local/bin/containerd-shim-runc-v2 -``` +For example, if the runtime name is `io.containerd.runc.v2`, containerd will invoke the shim as `containerd-shim-runc-v2`. It expects to +find the binary in its normal `PATH`. containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system. +For example: + +```bash +$ ctr --runtime io.containerd.runc.v2 run --rm docker.io/library/alpine:latest alpine +``` + +Will invoke `containerd-shim-runc-v2`. + +You can test this by trying another name: + +```bash +$ ctr run --runtime=io.foo.bar.runc2.v2.baz --rm docker.io/library/hello-world:latest hello-world /hello +ctr: failed to start shim: failed to resolve runtime path: runtime "io.foo.bar.runc2.v2.baz" binary not installed "containerd-shim-v2-baz": file does not exist: unknown +``` + +It received `io.foo.bar.runc2.v2.baz` and looked for `containerd-shim-v2-baz`. + +You also can override the default configured runtime for the shim, by passing it the `--runc-binary` +option. For example" + +``` +ctr --runtime io.containerd.runc.v2 --runc-binary /usr/local/bin/runc-custom run --rm docker.io/library/alpine:latest alpine +``` + +### Configuring Runtimes + +You can configure one or more runtimes in containerd's `config.toml` configuration file, by modifying the +section: + +```toml + [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] +``` + +See [config.toml man page](../../docs/man/containerd-config.toml.5.md) for more details and an example. + +These "named runtimes" in the configuration file are used solely when invoked via CRI, which has a +[`runtime_handler` field](https://github.com/kubernetes/cri-api/blob/de5f1318aede866435308f39cb432618a15f104e/pkg/apis/runtime/v1/api.proto#L476). + ## Shim Authoring This section is dedicated to runtime authors wishing to build a shim.