Merge pull request #9145 from deitch/doc-runtime-shim

document runtime and shim configuration and selection
This commit is contained in:
Samuel Karp 2023-11-09 07:24:31 +00:00 committed by GitHub
commit 669e0786d8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 240 additions and 18 deletions

View File

@ -27,12 +27,24 @@ containerd allows extensions through two methods:
### V2 Runtimes ### V2 Runtimes
The runtime v2 interface allows resolving runtimes to binaries on the system. containerd supports multiple container runtimes. Each container can be
These binaries are used to start the shim process for containerd and allows invoked with a different runtime.
containerd to manage those containers using the runtime shim api returned by
When using the Container Runtime Interface (CRI) plugin, named runtimes can be defined
in the containerd configuration file. When a container is run without specifying a runtime,
the configured default runtime is used. Alternatively, a different named runtime can be
specified explicitly when creating a container via CRI gRPC by selecting the runtime handler to be used.
When a client such as `ctr` or `nerdctl` creates a container, it can optionally specify a runtime and options to use.
If a runtime is not specified, containerd will use its default runtime.
containerd invokes v2 runtimes as binaries on the system,
which are used to start the shim process for containerd. This, in turn, allows
containerd to start and manage those containers using the runtime shim api returned by
the binary. the binary.
See [runtime v2 documentation](../runtime/v2/README.md) For more details on runtimes and shims, including how to invoke and configure them,
see the [runtime v2 documentation](../runtime/v2/README.md)
### Proxy Plugins ### Proxy Plugins

View File

@ -110,6 +110,15 @@ documentation.
for cache and memory bandwidth management. for cache and memory bandwidth management.
See [RDT configuration](https://github.com/intel/goresctrl/blob/main/doc/rdt.md#configuration) See [RDT configuration](https://github.com/intel/goresctrl/blob/main/doc/rdt.md#configuration)
for details of the file format. for details of the file format.
- **[plugins."io.containerd.grpc.v1.cri".containerd]** contains options for the CRI plugin, and child nodes for CRI options:
- **default_runtime_name** (Default: **"runc"**) specifies the default runtime name
- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]** one or more container runtimes, each with a unique name
- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.<runtime>]** a runtime named `<runtime>`
- **[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.<runtime>.options]** options for the named `<runtime>`, most important:
- **BinaryName** specifies the path to the actual runtime to be invoked by the shim, e.g. `"/usr/bin/runc"`
**oom_score** **oom_score**
: The out of memory (OOM) score applied to the containerd daemon process (Default: 0) : The out of memory (OOM) score applied to the containerd daemon process (Default: 0)
@ -151,7 +160,9 @@ the main config.
- **path** (Default: "") Path or name of the binary - **path** (Default: "") Path or name of the binary
- **args** (Default: "[]") Args to the binary - **args** (Default: "[]") Args to the binary
## EXAMPLE ## EXAMPLES
### Complete Configuration
The following is a complete **config.toml** default configuration example: The following is a complete **config.toml** default configuration example:
@ -200,6 +211,52 @@ imports = ["/etc/containerd/runtime_*.toml", "./debug.toml"]
rdt_config_file = "" rdt_config_file = ""
``` ```
### Multiple Runtimes
The following is an example partial configuraton with two runtimes:
```toml
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
privileged_without_host_devices = false
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = "/usr/bin/runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.other]
privileged_without_host_devices = false
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.other.options]
BinaryName = "/usr/bin/path-to-runtime"
```
The above creates two named runtime configurations - named `runc` and `other` - and sets the default runtime to `runc`.
The above are used _solely_ for runtimes invoked via CRI. To use the non-default "other" runtime in this example,
a spec will include the runtime handler named "other" to specify the desire to use the named runtime config.
The CRI specification includes a [`runtime_handler` field](https://github.com/kubernetes/cri-api/blob/de5f1318aede866435308f39cb432618a15f104e/pkg/apis/runtime/v1/api.proto#L476), which will reference the named runtime.
It is important to note the naming convention. Runtimes are under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]`,
with each runtime given a unique name, e.g. `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]`.
In addition, each runtime can have shim-specific options under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.<runtime>.options]`,
for example, `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]`.
The `io.containerd.runc.v2` runtime is used to run OCI-compatible runtimes on Linux, such as runc. In the example above, the `runtime_type`
field specifies the shim to use (`io.containerd.runc.v2`) while the `BinaryName` field is a shim-specific option which specifies the path to the
OCI runtime.
For the example configuration named "runc", the shim will launch `/usr/bin/runc` as the OCI runtime. For the example configuration named
"other", the shim will launch `/usr/bin/path-to-runtime` instead.
## BUGS ## BUGS
Please file any specific issues that you encounter at Please file any specific issues that you encounter at

View File

@ -1,30 +1,183 @@
# Runtime v2 # Runtime v2
Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd. Runtime v2 introduces a first class shim API for runtime authors to integrate with containerd.
The shim API is minimal and scoped to the execution lifecycle of a container.
## Binary Naming containerd, the daemon, does not directly launch containers. Instead, it acts as a higher-level manager
or hub for coordinating the activities of containers and content, that lower-level
programs, called "runtimes", actually implement to start, stop and manage containers,
either individual containers or groups of containers, e.g. Kubernetes pods.
For example, containerd will retrieve container image config and its content as layers, use the snapshotter to lay it out on disk, set up
the container's rootfs and config, and then launch a runtime that will create/start/stop the container.
This document describes the major components of the v2 runtime integration model, how the components interact
with containerd and the v2 runtime, and how to use and integrate different v2 runtimes.
To simplify the interaction, runtime v2 introduced a first class v2 API for runtime authors to integrate with containerd,
replacing the v1 API.
The v2 API is minimal and scoped to the execution lifecycle of a container.
This document is split into the following sections:
* [architecture](#architecture) - the major components, their purposes and relationships
* [usage](#usage) - how to invoke specific runtimes, and how to configure them
* [authoring](#shim-authoring) - how to author a v2 runtime
## Architecture
### containerd-runtime communication
containerd expects a runtime to implement several container control features, such as create, start and stop.
The high-level flow is as follows:
1. client requests from containerd to create a container
1. containerd lays out the container's filesystem, and creates the necessary config information
1. containerd invokes the runtime over an API to create/start/stop the container
However, containerd itself does not actually directly invoke the runtime to start the container.
Instead it expects to invoke the runtime, which will expose a socket - Unix-domain on Unix-like systems, named pipe on Windows -
and listen for container commands via [ttRPC](https://github.com/containerd/ttrpc) over that
socket.
The runtime is expected to process those operations. How it does so is entirely within the scope of the runtime implementation.
Two common patterns are:
* a single binary for runtime that both listens on the socket and creates/starts/stops the container
* a separate shim binary that listens on the socket, and invokes a separate runtime engine that creates/starts/stops the container
The separate "shim+engine" pattern is used because it makes it easier to integrate distinct runtimes implementing a specific runtime
engine spec, such as the [OCI runtime spec](https://github.com/opencontainers/runtime-spec).
The ttRPC protocol can be handled via one runtime shim, while distinct runtime engine implementations can
be used, as long as they implement the OCI runtime spec.
The most commonly used runtime _engine_ is [runc](https://github.com/opencontainers/runc), which implements the
[OCI runtime spec](https://github.com/opencontainers/runtime-spec). As this is a runtime _engine_, it is not
invoked directly by containerd; instead, it is invoked by a shim, which listens on the socket and invokes the runtime engine.
#### shim+engine Architecture
##### runtime shim
The runtime shim is what actually is invoked by containerd. It has minimal options on start beyond
being provided the communications port for containerd and some configuration information.
The runtime shim listens on the socket for ttRPC commands from containerd, and then invokes a separate program,
the runtime engine, via `fork`/`exec` to run the container. For example, the `io.containerd.runc.v2` shim invokes
an OCI compliant runtime engine such as `runc`.
containerd passes options to the shim over the ttRPC connection, which may include the runtime engine binary
to invoke. These are the `options` for the [`CreateTaskRequest`](https://github.com/containerd/containerd/blob/main/runtime/v2/README.md#container-level-shim-configuration).
For example, the `io.containerd.runc.v2` shim supports including the path to the runtime engine binary.
##### runtime engine
The runtime engine itself is what actually starts and stops the container.
For example, in the case of [runc](https://github.com/opencontainers/runc), the containerd project provides the shim
as the executable `containerd-shim-runc-v2`. This is invoked by containerd and starts the ttRPC listener.
The shim then invokes the actual `runc` binary, passing it the container configuration, and the `runc` binary
creates/starts/stops the container typically via `libcontainer`->system apis.
#### shim+engine Relationship
Since each shim instance communicates with containerd as a daemon, while parenting containers via invoking independent runtimes,
it is possible to have one shim for multiple containers and invocations. For example,
you could have one `containerd-shim-runc-v2` communicating with one containerd, and it can
invoke ten distinct containers.
It even is possible to have one shim for multiple containers, each with its own actual runtime,
since, as described above, the runtime binary is passed as one of the options in `CreateTaskRequest`.
containerd does not know or care about whether the shim to container relationship is one-to-one,
or one-to-many. It is entirely up to the shim to decide. For example, the `io.containerd.runc.v2` shim
automatically groups based on the presence of
[labels](https://github.com/containerd/containerd/blob/b30e0163ac36c1a193604e5eca031053d62019c5/runtime/v2/runc/manager/manager_linux.go#L54-L60). In practice, this means that containers launched by Kubernetes, that are part of the same Kubernetes pod, are handled by a single
shim, grouping on the `io.kubernetes.cri.sandbox-id` label set by the CRI plugin.
The flow, then, is as follows:
1. containerd receives a request to create a container
1. containerd lays out the container's filesystem, and creates the necessary [container config](https://github.com/opencontainers/image-spec/blob/main/config.md) information
1. containerd invokes the shim, including container configuration, which uses that information to decide whether to launch a new socket listener (1:1 shim to container) or use an existing one (1:many)
* if existing, return the address of the existing socket and exit
* if new, the shim:
1. creates a new process to listen on a socket for ttRPC commands from containerd
1. returns the address to that socket to containerd
1. exits
1. containerd sends the shim a command to start the container
1. The shim invokes `runc` to create/start/stop the container
An excellent flow diagram is available later in this document under [Flow](#Flow).
## Usage
### Invoking Runtimes
A runtime - single instance or shim+engine - and its options, can be selected when creating a container via one of the exposed
containerd services (containerd client, CRI API,...), or via a client that calls into the containerd provided services.
Examples of containerd clients include `ctr`, `nerdctl`, kubernetes, docker/moby, rancher and others.
Users specify the runtime they wish to use when creating a container.
The runtime can also be changed via a container update. The runtime can also be changed via a container update.
```bash The runtime name that is passed is a string that is used to identify the runtime to containerd. In the case of separate shim+engine,
> ctr run --runtime io.containerd.runc.v2 this will be the runtime _shim_. Either way, this is the binary that containerd executes and expects to start the ttRPC listener.
``` The runtime name can be either a URI-like string, or, beginning with containerd 1.6.0, the actual path to the executable.
When a user specifies a runtime name, `io.containerd.runc.v2`, they will specify the name and version of the runtime. 1. If the runtime name is a path, use that as the actual path to the runtime to invoke.
This will be translated by containerd into a binary name for the shim. 1. If the runtime name is URI-like, convert it to a runtime name using the below logic.
`io.containerd.runc.v2` -> `containerd-shim-runc-v2` If the runtime name is URI-like, containerd will convert the passed runtime from the URI-like name to a binary name using the following logic:
Since 1.6 release, it's also possible to specify absolute runtime path: 1. Replaces all `.` with `-`
1. Takes the last 2 components, e.g. `runc.v2`
1. Prepends `containerd-shim`
```bash For example, if the runtime name is `io.containerd.runc.v2`, containerd will invoke the shim as `containerd-shim-runc-v2`. It expects to
> ctr run --runtime /usr/local/bin/containerd-shim-runc-v2 find the binary in its normal `PATH`.
```
containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system. containerd keeps the `containerd-shim-*` prefix so that users can `ps aux | grep containerd-shim` to see running shims on their system.
For example:
```bash
$ ctr --runtime io.containerd.runc.v2 run --rm docker.io/library/alpine:latest alpine
```
Will invoke `containerd-shim-runc-v2`.
You can test this by trying another name:
```bash
$ ctr run --runtime=io.foo.bar.runc2.v2.baz --rm docker.io/library/hello-world:latest hello-world /hello
ctr: failed to start shim: failed to resolve runtime path: runtime "io.foo.bar.runc2.v2.baz" binary not installed "containerd-shim-v2-baz": file does not exist: unknown
```
It received `io.foo.bar.runc2.v2.baz` and looked for `containerd-shim-v2-baz`.
You also can override the default configured runtime for the shim, by passing it the `--runc-binary`
option. For example"
```
ctr --runtime io.containerd.runc.v2 --runc-binary /usr/local/bin/runc-custom run --rm docker.io/library/alpine:latest alpine
```
### Configuring Runtimes
You can configure one or more runtimes in containerd's `config.toml` configuration file, by modifying the
section:
```toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
```
See [config.toml man page](../../docs/man/containerd-config.toml.5.md) for more details and an example.
These "named runtimes" in the configuration file are used solely when invoked via CRI, which has a
[`runtime_handler` field](https://github.com/kubernetes/cri-api/blob/de5f1318aede866435308f39cb432618a15f104e/pkg/apis/runtime/v1/api.proto#L476).
## Shim Authoring ## Shim Authoring
This section is dedicated to runtime authors wishing to build a shim. This section is dedicated to runtime authors wishing to build a shim.