Adds a new platform interface for matching and comparing platforms.
This new interface allows both filtering and ordering of platforms
to support running multiple platform and choosing the best platform.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
These opts either inherit the parent cgroup device.list or append the
default unix devices like /dev/null /dev/random so that the container
has access.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This makes it easier for callers to call this function and populate the
config without relying on specific flags across commands.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
A fifo on unix or named pipe on Windows will be provided to the shim.
It can be located inside the `cwd` of the shim named "log".
The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages.
Messages will automatically be output in the containerd's daemon logs with the correct fiels and runtime set.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Some images like `criu` will have extra libs that it requires. This
adds lib support via LD_LIBRARY_PATH and InstallOpts
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds a way for users to programatically install containerd binary
dependencies.
With runtime v2 and new shim's being built, it will be a challenge to
get those onto machines. Users would have to find the link, download,
place it in their path, yada yada yada.
With this functionality of a managed `/opt` directory, containerd can
use existing image and distribution infra. to get binarys, shims, etc
onto the system.
Configuration:
*default:* `/opt/containerd`
*containerd config:*
```toml
[plugins.opt]
path = "/opt/mypath"
```
Usage:
*code:*
```go
image, err := client.Pull(ctx, "docker.io/crosbymichael/runc:latest")
client.Install(ctx, image)
```
*ctr:*
```bash
ctr content fetch docker.io/crosbymichael/runc:latest
ctr install docker.io/crosbymichael/runc:latest
```
You can manage versions and see what is running via standard image
commands.
Images:
These images MUST be small and only contain binaries.
```Dockerfile
FROM scratch
Add runc /bin/runc
```
Containerd will only extract files in `/bin` of the image.
Later on, we can add support for `/lib`.
The code adds a service to manage an `/opt/containerd` directory and
provide that path to callers via the introspection service.
How to Test:
Delete runc from your system.
```bash
> sudo ctr run --rm docker.io/library/redis:alpine redis
ctr: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/default/redis/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown
> sudo ctr content fetch docker.io/crosbymichael/runc:latest
> sudo ctr install docker.io/crosbymichael/runc:latest
> sudo ctr run --rm docker.io/library/redis:alpine redis
1:C 01 Aug 15:59:52.864 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 Aug 15:59:52.864 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 Aug 15:59:52.864 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 01 Aug 15:59:52.866 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 01 Aug 15:59:52.866 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 01 Aug 15:59:52.866 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
1:M 01 Aug 15:59:52.870 * Running mode=standalone, port=6379.
1:M 01 Aug 15:59:52.870 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 01 Aug 15:59:52.870 # Server initialized
1:M 01 Aug 15:59:52.870 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 01 Aug 15:59:52.870 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 01 Aug 15:59:52.870 * Ready to accept connections
^C1:signal-handler (1533139193) Received SIGINT scheduling shutdown...
1:M 01 Aug 15:59:53.472 # User requested shutdown...
1:M 01 Aug 15:59:53.472 * Saving the final RDB snapshot before exiting.
1:M 01 Aug 15:59:53.484 * DB saved on disk
1:M 01 Aug 15:59:53.484 # Redis is now ready to exit, bye bye...
```
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Reorders the code so that it doesnt overwrite the previous allocation
when creating a NewTask via ctr.exe
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
We introduce a WithSpecFromFile option combinator to allow creation
simpler creation of OCI specs from a file name. Often used as the first
option in a `SpecOpts` slice, it simplifies choosing between a local
file and the built-in default.
The code in `ctr run` has been updated to use the new option, with out
changing the order of operations or functionality present there.
Signed-off-by: Stephen Day <stephen.day@getcruise.com>
Implements the various requirements for the runtime v2 code to abstract
away the unix/linux code into the appropriate platform level
abstractions to use the runtime v2 on Windows as well.
Adds support in the Makefile.windows to actually build the runtime v2
code for Windows by setting a shell environment BUILD_WINDOWS_V2=1
before calling make. (Note this disables the compilation of the Windows
runtime v1)
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This patch changes the logs format to use a fixed-width timestamp,
matching the format that's used in dockerd.
Before:
$ containerd
INFO[0000] starting containerd revision=a88b6319614de846458750ff882723479ca7b1a1 version=v1.1.0-202-ga88b6319
INFO[0000] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
After:
$ containerd
INFO[2018-07-24T08:11:07.397856489Z] starting containerd revision=c3195155cacb361cd3549c4d78901b20aa19579a version=v1.1.0-203-gc3195155
INFO[2018-07-24T08:11:07.399264587Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2018-07-24T08:11:07.399343959Z] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[2018-07-24T08:11:07.399474423Z] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Or, when running as child-process of dockerd:
Before:
root@9637fcd85ea4:/go/src/github.com/docker/docker# dockerd --debug
DEBU[2018-07-24T08:15:16.946312436Z] Listener created for HTTP on unix (/var/run/docker.sock)
INFO[2018-07-24T08:15:16.947086499Z] libcontainerd: started new docker-containerd process pid=231
INFO[2018-07-24T08:15:16.947137166Z] parsed scheme: "unix" module=grpc
INFO[2018-07-24T08:15:16.947235001Z] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2018-07-24T08:15:16.947463403Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}] module=grpc
INFO[2018-07-24T08:15:16.947505954Z] ClientConn switching balancer to "pick_first" module=grpc
INFO[2018-07-24T08:15:16.947717368Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420507ab0, CONNECTING module=grpc
INFO[0000] starting containerd revision=d64c661f1d51c48782c9cec8fda7604785f93587 version=v1.1.1
DEBU[0000] changing OOM score to -500
INFO[0000] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
After:
DEBU[2018-07-24T08:21:33.441741970Z] Listener created for HTTP on unix (/var/run/docker.sock)
INFO[2018-07-24T08:21:33.442428017Z] libcontainerd: started new docker-containerd process pid=232
INFO[2018-07-24T08:21:33.442510827Z] parsed scheme: "unix" module=grpc
INFO[2018-07-24T08:21:33.442598812Z] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2018-07-24T08:21:33.442681006Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}] module=grpc
INFO[2018-07-24T08:21:33.442770353Z] ClientConn switching balancer to "pick_first" module=grpc
INFO[2018-07-24T08:21:33.442871502Z] pickfirstBalancer: HandleSubConnStateChange: 0xc42018bc30, CONNECTING module=grpc
INFO[2018-07-24T08:21:33.457963804Z] starting containerd revision=597dd082e37f8bc6b6265ca05839d7a300861911 version=597dd082
DEBU[2018-07-24T08:21:33.458113301Z] changing OOM score to -500
INFO[2018-07-24T08:21:33.458474842Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2018-07-24T08:21:33.458911054Z] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[2018-07-24T08:21:33.459366268Z] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Separate Fetch and Pull commands in client to distinguish
between platform specific and non-platform specific operations.
`ctr images pull` with all platforms will now unpack all platforms.
`ctr content fetch` now supports platform flags.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
`Ctr` interface follows the pattern `ctr <command> <subcommand>` except
for the `plugins` command which does not have subcommands. This feels
unnatural to certain users and they would expect that they can list
containerd plugins via `ctr plugins list`.
This commit implements their expectation so that `plugins` becomes a
command "group" and its `list` subcommand actually lists the plugins.
Signed-off-by: Danail Branekov <danailster@gmail.com>
This adds a `Load` Opt for cio to load a tasks io/fifos without
attaching or starting the copy routines.
It adds the load method in `ctr` by default so that fifos or other IO
are removed from disk on delete methods inbetween command runs. It is
not the default for all task loads for backwards compat. and a user may
want to keep io around to reuse or if log files are used.
Fixes#2421
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Commit 05513284e7 exposed the "rootfs"
and "no-pivot" flags for the "containers" command, but it accidentally
removed them for "run" since package-level variables are initialized
before package-level init functions in golang. Hoisting these flags to
a package imported by both commands solves the problem.
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
This change allows implementations to resolve the location of the actual data
using OCI descriptor fields such as MediaType.
No OCI descriptor field is written to the store.
No change on gRPC API.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This allows many different commands to be used as OCI hooks. It allows
these commands to template out different args and env vars so that
normal commands can accept the OCI spec State payload over stdin.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows non-privileged users to use containerd. This is part of a
larger track of work integrating containerd into Cloudfoundry's garden
with support for rootless.
[#156343575]
Signed-off-by: Claudia Beresford <cberesford@pivotal.io>
This adds gc.root label to snapshots created with prepare and commit via
the CLI. WIthout this, created snapshots get immediately garbage
collected. There may be a better solution but this seems to be a solid
stop gap.
We may also need to add more functionality around snapshot labeling for
the CLI but current use cases are unclear.
Signed-off-by: Stephen J Day <stevvooe@gmail.com>
Since Go 1.7, context is a standard package, superceding the
"x/net/context". Since Go 1.9, the latter only provides a few type
aliases from the former. Therefore, it makes sense to switch to the
standard package.
This commit was generated by the following script (with a couple of
minor fixups to remove extra changes done by goimports):
#!/bin/bash
if [ $# -ge 1 ]; then
FILES=$*
else
FILES=$(git ls-files \*.go | grep -vF ".pb.go" | grep -v
^vendor/)
fi
for f in $FILES; do
printf .
sed -i -e 's|"golang.org/x/net/context"$|"context"|' $f
goimports -w $f
awk ' /^$/ {e=1; next;}
/[[:space:]]"context"$/ {e=0;}
{if (e) {print ""; e=0}; print;}' < $f > $f.new && \
mv $f.new $f
goimports -w $f
done
echo
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Go 1.7, "context" is a standard package, superceding the
"x/net/context". Since Go 1.9, the latter only provides type aliases
from the former. Therefore, it makes sense to switch to the standard
package, and the change is not disruptive in any sense.
This commit deals with a few cases where both packages happened to be
imported by the same source file. A choice between "context" and
"gocontext" was made for each file in order to minimize the patch.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Show new tag when dependencies don't have a previous version.
Align dependencies into columns.
Sort dependencies by name.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Allows the client to choose the context to finish the lease.
This allows the client to switch contexts when the main context
used to the create the lease may have been cancelled.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Running the density tool will report Pss and Rss total and per container
values for shim memory usage. Values are reported in KB.
```bash
containerd-stress density --count 500
INFO[0000] pulling docker.io/library/alpine:latest
INFO[0000] generating spec from image
{"pss":421188,"rss":2439688,"pssPerContainer":842,"rssPerContainer":4879}
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Without this `ctr run` can fail with:
ctr: parent snapshot sha256:70798fd80095f40b41baa5d107fb61532bfe494d96313fea01e8fcbf4e8743ee does not exist: not found
My image was produced by buildkit, which doesn't unpack (I think this makes
sense since buildkit doesn't know if I am going to run the image or export/push
it etc).
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Adds a useful flag to `ctr` to enable joining any existing Linux
namespaces for any namespace types (network, pid, ipc, etc.) using the
existing With helper in the oci package.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This uses a simple `IsAbs` check to see if we are using an on disk path
for a unix socket vs an address since we do not prefix addresses with
`unix://` or `tcp://`.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
For missing required parameters adds error return before attempting any
actions to `ctr images` commands.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
To avoid having the shim hold on to too much memory, we've made a few
adjustments to favor more aggressive reclamation of memory from the
operating system. Typically, this would be negligible, on the order of a
few megabytes, but this is impactful when running several containers.
The first fix is to lower the threshold used to determine when to run
the garbage collector. The second runs `runtime/debug.FreeOSMemory` at a
regular interval.
Under test, this result in a sustained memory usage of around 3.7 MB.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changes the Windows runtime to use the snapshotter and differ
created layers, and updates the ctr commands to use the snapshotter and differ.
Signed-off-by: Darren Stahl <darst@microsoft.com>
This implements the Windows snapshotter and diff Apply function.
This allows for Windows layers to be created, and layers to be pulled
from the hub.
Signed-off-by: Darren Stahl <darst@microsoft.com>
Given these same exact functions are both now available in
opencontainers/runc (libcontainer/system) package, and we only use the
`SetSubreaper` today from the shim, there seems to be no reason for
duplication.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
To avoid buffer bloat in long running processes, we try to use buffer
pools where possible. This is meant to address shim memory usage issues,
but may not be the root cause.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This improves the exec support so that they can run along with the
normal stress tests. You don't have to pick exec stres or container
stress. They both run at the same time and report the different values.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After comtemplation, the complexity of the logging module system
outweighs its usefulness. This changeset removes the system and restores
lighter weight code paths. As a concession, we can always provide more
context when necessary to log messages to understand them without having
to fork the context for a certain set of calls.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This allows other packages and plugins to easily exec things without
racing with the reaper.
The reaper is mostly needed in the shim but can be removed in containerd
in favor of the `exec.Cmd` apis
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This linter checks for unnecessary type convertions.
Some convertions are whitelisted because their type is different
on 32bit platforms
Signed-off-by: Daniel Nephin <dnephin@gmail.com>
Preserves the order of the tree output between each execution. Slightly
refactored the behavior to be more "object oriented".
Signed-off-by: Stephen J Day <stephen.day@docker.com>
- Use lease API (previoisly, GC was not supported)
- Refactored interfaces for ease of future Docker v1 importer support
For usage, please refer to `ctr images import --help`.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This subreaper should always be turned on for containerd unless
explicitly needed for it to be off.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Could issues where when exec processes fail the wait block is not
released.
Second, you could not dump stacks if the reaper loop locks up.
Third, the publisher was not waiting on the correct pid.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The shim doesn't need massive concurrency and a bunch of CPUs to do its
job correctly. We can reduce the number of threads to save memory at
little cost to performance.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
By replacing grpc with ttrpc, we can reduce total memory runtime
requirements and binary size. With minimal code changes, the shim can
now be controlled by the much lightweight protocol, reducing the total
memory required per container.
When reviewing this change, take particular notice of the generated shim
code.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Synchronous image delete provides an option image delete to wait
until the next garbage collection deletes after an image is removed
before returning success to the caller.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Add garbage collection as a background process and policy
configuration for configuring when to run garbage collection.
By default garbage collection will run when deletion occurs
and no more than 20ms out of every second.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The binary name used for executing "containerd publish" was hard-coded
in the shim code, and hence it did not work with customized daemon
binary name. (e.g. `docker-containerd`)
This commit allows specifying custom daemon binary via `containerd-shim
-containerd-binary ...`.
The daemon invokes this command with `os.Executable()` path.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Currently the output for a non-existent image reference and a valid
image reference is exactly the same on `ctr images remove`. Instead of
outputting the target ref input, if it is "not found" we should alert
the user in case of a mispelling, but continue not to make it a failure
for the command (given it supports multiple ref entries)
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This patch changes the output of `ctr version` to align version and revision.
It also changes `.Printf()` to `.Println()`, to make the code slightly easier
to read.
Before this change:
$ ctr version
Client:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Server:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
With this patch applied:
$ ctr version
Client:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Server:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
To reduce the binary size of containerd, we no longer import the
`server` package for only a few defaults. This reduces the size of `ctr`
by 2MB. There are probably other gains elsewhere.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This keeps the semantics the same as the other commands to only list
containers, tasks, images by calling the list subcommand.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows a project to have a TEMPLATE file in the root of the repo to
be used with the release tool. If they don't have this file and did not
specify a custom file then it will use the compiled in template in the
release tool.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow a user provided name for the checkpoint as well as a default
generated name for the checkpoint image.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows one to manage the checkpoints by using the `ctr image`
command.
The image is created with label "containerd.io/checkpoint". By
default, it is not included in the output of `ctr images ls`.
We can list the images by using the following command:
$ ctr images ls labels.containerd.\"io/checkpoint\"==true
Fixes#1026
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
This tool makes our standard release template easy to generate. It also
adds a few features like marking changed dependnencies for packages and
others to know what updated from the last release.
usage:
`containerd-release -n releases/v1.0.0-beta.2.toml`
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add differ options and package with interface.
Update optional values on diff interface to use options.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
With this change, we integrate all the plugin changes into the
introspection service.
All plugins can be listed with the following command:
```console
$ ctr plugins
TYPE ID PLATFORM STATUS
io.containerd.content.v1 content - ok
io.containerd.metadata.v1 bolt - ok
io.containerd.differ.v1 walking linux/amd64 ok
io.containerd.grpc.v1 containers - ok
io.containerd.grpc.v1 content - ok
io.containerd.grpc.v1 diff - ok
io.containerd.grpc.v1 events - ok
io.containerd.grpc.v1 healthcheck - ok
io.containerd.grpc.v1 images - ok
io.containerd.grpc.v1 namespaces - ok
io.containerd.snapshotter.v1 btrfs linux/amd64 error
io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
io.containerd.grpc.v1 snapshots - ok
io.containerd.monitor.v1 cgroups linux/amd64 ok
io.containerd.runtime.v1 linux linux/amd64 ok
io.containerd.grpc.v1 tasks - ok
io.containerd.grpc.v1 version - ok
```
There are few things to note about this output. The first is that it is
printed in the order in which plugins are initialized. This useful for
debugging plugin initialization problems. Also note that even though the
introspection GPRC api is a itself a plugin, it is not listed. This is
because the plugin takes a snapshot of the initialization state at the
end of the plugin init process. This allows us to see errors from each
plugin, as they happen. If it is required to introspect the existence of
the introspection service, we can make modifications to include it in
the future.
The last thing to note is that the btrfs plugin is in an error state.
This is a common state for containerd because even though we load the
plugin, most installations aren't on top of btrfs and the plugin cannot
be used. We can actually view this error using the detailed view with a
filter:
```console
$ ctr plugins --detailed id==btrfs
Type: io.containerd.snapshotter.v1
ID: btrfs
Platforms: linux/amd64
Exports:
root /var/lib/containerd/io.containerd.snapshotter.v1.btrfs
Error:
Code: Unknown
Message: path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter
```
Along with several other values, this is a valuable tool for evaluating the
state of components in containerd.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
By default, the generated spec will place containers in cgroups by their
ids, we need to use the namespace as the cgroup root to avoid
containers with the same name being placed in the same cgroup.
```
11:perf_event:/to/redis
10:freezer:/to/redis
9:memory:/to/redis
8:devices:/to/redis
7:net_cls,net_prio:/to/redis
6:pids:/to/redis
5:hugetlb:/to/redis
4:cpuset:/to/redis
3:blkio:/to/redis
2:cpu,cpuacct:/to/redis
1:name=systemd:/to/redis
11:perf_event:/te/redis
10:freezer:/te/redis
9:memory:/te/redis
8:devices:/te/redis
7:net_cls,net_prio:/te/redis
6:pids:/te/redis
5:hugetlb:/te/redis
4:cpuset:/te/redis
3:blkio:/te/redis
2:cpu,cpuacct:/te/redis
1:name=systemd:/te/redis
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This makes sure the client is always in sync with the server before
performing any type of operations on the container metadata.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The `Check` function returns information about an image's content components
over a content provider. From this information, one can tell which content is
required, present or missing to run an image.
The utility can be demonstrated with the `check` command:
```console
$ ctr images check
REF TYPE DIGEST STATUS SIZE
docker.io/library/alpine:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:f006ecbb824d87947d0b51ab8488634bf69fe4094959d935c0c103f4820a417d incomplete (1/2) 1.5 KiB/1.9 MiB
docker.io/library/postgres:latest application/vnd.docker.distribution.manifest.v2+json sha256:2f8080b9910a8b4f38ff5a55a82e77cb43d88bdbb16d723c71d18493590832e9 complete (13/13) 99.3 MiB/99.3 MiB
docker.io/library/redis:alpine application/vnd.docker.distribution.manifest.v2+json sha256:e633cded055a94202e4ccccb8125b7f383cd6ee56527ab890db643383a2647dd incomplete (6/7) 8.1 MiB/10.0 MiB
docker.io/library/ubuntu:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:60f835698ea19e8d9d3a59e68fb96fb35bc43e745941cb2ea9eaf4ba3029ed8a unavailable (0/?) 0.0 B/?
docker.io/trollin/busybox:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:54a6424f7a2d5f4f27b3d69e5f9f2bc25fe9087f0449d3cb4215db349f77feae complete (2/2) 699.9 KiB/699.9 KiB
```
The above shows us that we have two incomplete images and one that is
unavailable. The incomplete images are those that we know the complete
size of all content but some are missing. "Unavailable" means that the
check could not get enough information about the image to get its full
size.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The SIGUNUSED constant was removed from golang.org/x/sys/unix in
https://go-review.googlesource.com/61771 as it is also removed from the
respective glibc headers.
This means the command
ctr tasks kill SIGUNUSED ...
will no longer work. However, the same effect can be achieved with
ctr tasks kill SIGSYS ...
as SIGSYS has the same value as SIGUNUSED used to have.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Fixes pulling of multi-arch images by limiting the expansion
of the index by filtering to the current default platform.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This handles signals first thing on boot so that plugins are able to
boot with the reaper enabled.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Content commit is updated to take in a context, allowing
content to be committed within the same context the writer
was in. This is useful when commit may be able to use more
context to complete the action rather than creating its own.
An example of this being useful is for the metadata implementation
of content, having a context allows tests to fully create
content in one database transaction by making use of the context.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The labels can be very long (e.g. cri-containerd stores a large JSON metadata
blob as `io.cri-containerd.container.metadata`) which renders the output
useless due to all the line wrapping etc.
The information is still available in `ctr containers info «name»`.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This also fix the type used for RuncOptions.SystemCgroup, hence introducing
an API break.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Fixes#1431
This adds KillOpts so that a client can specify when they want to kill a
single process or all the processes inside a container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
In order to do more advanced spec generation with images, snapshots,
etc, we need to inject the context and client into the spec generation
code.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After the rework of server-side defaults, the `ctr snapshot` command
stopped working due to no default snapshotter.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Instead of requiring callers to read the struct fields to check for an
error, provide the exit results via a function instead which is more
natural.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In all of the examples, its recommended to call `Wait()` before starting
a process/task.
Since `Wait()` is a blocking call, this means it must be called from a
goroutine like so:
```go
statusC := make(chan uint32)
go func() {
status, err := task.Wait(ctx)
if err != nil {
// handle async err
}
statusC <- status
}()
task.Start(ctx)
<-statusC
```
This means there is a race here where there is no guarentee when the
goroutine is going to be scheduled, and even a bit more since this
requires an RPC call to be made.
In addition, this code is very messy and a common pattern for any caller
using Wait+Start.
Instead, this changes `Wait()` to use an async model having `Wait()`
return a channel instead of the code itself.
This ensures that when `Wait()` returns that the client has a handle on
the event stream (already made the RPC request) before returning and
reduces any sort of race to how the stream is handled by grpc since we
can't guarentee that we have a goroutine running and blocked on
`Recv()`.
Making `Wait()` async also cleans up the code in the caller drastically:
```go
statusC, err := task.Wait(ctx)
if err != nil {
return err
}
task.Start(ctx)
status := <-statusC
if status.Err != nil {
return err
}
```
No more spinning up goroutines and more natural error
handling for the caller.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This adds null IO option for efficient handling of IO.
It provides a container directly with `/dev/null` and does not require
any io.Copy within the shim whenever a user does not want the IO of the
container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds a `stress` binary to help stress test containerd. It is
different from a benchmarking tool as it only gives a simple summary at
the end.
It is built to run long, multi hour/day stress tests across builds of
containerd.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The argument order, naming and behavior of the snapshots command didn't
really follow any of the design constraints or conventions of the
`Snapshotter` interface. This brings the command into line with that
interface definition.
The `snapshot archive` command has been removed as it requires more
thought on design to correctly emit diffs.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
After some analysis, it was found that Content.Reader was generally
redudant to an io.ReaderAt. This change removes `Content.Reader` in
favor of a `Content.ReaderAt`. In general, `ReaderAt` can perform better
over interfaces with indeterminant latency because it avoids remote
state for reads. Where a reader is required, a helper is provided to
convert it into an `io.SectionReader`.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall with package
golang.org/x/sys/{unix,windows} where applicable.
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes for syscall wrappers
without having to use a new go version.
Errno, Signal and SysProcAttr aren't changed as they haven't been
implemented in x/sys/. Stat_t from syscall is used if standard library
packages (e.g. os) require it. syscall.ENOTSUP, syscall.SIGKILL and
syscall.SIGTERM are used for cross-platform files.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This sets the subreaper to true in the default linux config as the
common usecase is to not run containerd as pid 1.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Snapshotters for run must be created with requested snapshotter.
The order of the options is important to ensure that the snapshotter
is set before the snapshots are created.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This changes Wait() from returning an error whenever you call wait on a
stopped process/task to returning the exit status from the process.
This also adds the exit status to the Status() call on a process/task so
that a user can Wait(), check status, then cancel the wait to avoid
races in event handling.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This splits up the create and start of an exec process in the shim to
have two separate steps like the initial process. This will allow
better state reporting for individual process along with a more robust
wait for execs.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
To ensure consistent fieldpath matching for events, we generate the
fieldpath matching using protobuf definitions. This is done through a
plugin called "fieldpath" that defines a `Field` method for each type
with the plugin enabled. Generated code handles top-level envelope
fields, as well as deferred serialization for matching any types.
In practice, this means that we can cheaply match events on `topic` and
`namespace`. If we want to match on attributes within the event, we can
use the `event` prefix to address these fields. For example, the
following will match all envelopes that have a field named
`container_id` that has the value `testing`:
```
ctr events "event.container_id==testing"
```
The above will decode the underlying event and check that particular
field. Accordingly, if only `topic` or `namespace` is used, the event
will not be decoded and only match on the envelope.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This change further plumbs the components required for implementing
event filters. Specifically, we now have the ability to filter on the
`topic` and `namespace`.
In the course of implementing this functionality, it was found that
there were mismatches in the events API that created extra serialization
round trips. A modification to `typeurl.MarshalAny` and a clear
separation between publishing and forwarding allow us to avoid these
serialization issues.
Unfortunately, this has required a few tweaks to the GRPC API, so this
is a breaking change. `Publish` and `Forward` have been clearly separated in
the GRPC API. `Publish` honors the contextual namespace and performs
timestamping while `Forward` simply validates and forwards. The behavior
of `Subscribe` is to propagate events for all namespaces unless
specifically filtered (and hence the relation to this particular change.
The following is an example of using filters to monitor the task events
generated while running the [bucketbench tool](https://github.com/estesp/bucketbench):
```
$ ctr events 'topic~=/tasks/.+,namespace==bb'
...
2017-07-28 22:19:51.78944874 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-6-8","pid":25889}
2017-07-28 22:19:51.791893688 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-4-8","pid":25882}
2017-07-28 22:19:51.792608389 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-2-9","pid":25860}
2017-07-28 22:19:51.793035217 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-5-6","pid":25869}
2017-07-28 22:19:51.802659622 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-0-7","pid":25877}
2017-07-28 22:19:51.805192898 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-3-6","pid":25856}
2017-07-28 22:19:51.832374931 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-8-6","id":"bb-ctr-8-6","pid":25864,"exited_at":"2017-07-28T22:19:51.832013043Z"}
2017-07-28 22:19:51.84001249 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-2-9","id":"bb-ctr-2-9","pid":25860,"exited_at":"2017-07-28T22:19:51.839717714Z"}
2017-07-28 22:19:51.840272635 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-7-6","id":"bb-ctr-7-6","pid":25855,"exited_at":"2017-07-28T22:19:51.839796335Z"}
...
```
In addition to the events changes, we now display the namespace origin
of the event in the cli tool.
This will be followed by a PR to add individual field filtering for the
events API for each event type.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Export as a tar (Note: "-" can be used for stdout):
$ ctr images export /tmp/oci-busybox.tar docker.io/library/busybox:latest
Import a tar (Note: "-" can be used for stdin):
$ ctr images import foo/new:latest /tmp/oci-busybox.tar
Note: media types are not converted at the moment: e.g.
application/vnd.docker.image.rootfs.diff.tar.gzip
-> application/vnd.oci.image.layer.v1.tar+gzip
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
e.g. ctr run -t --rm --rootfs /tmp/busybox-rootfs foo /bin/sh
(--rm removes the container but does not remove rootfs dir, of course)
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
In the course of setting out to add filters and address some cleanup, it
was found that we had a few problems in the events subsystem that needed
addressing before moving forward.
The biggest change was to move to the more standard terminology of
publish and subscribe. We make this terminology change across the Go
interface and the GRPC API, making the behavior more familier. The
previous system was very context-oriented, which is no longer required.
With this, we've removed a large amount of dead and unneeded code. Event
transactions, context storage and the concept of `Poster` is gone. This
has been replaced in most places with a `Publisher`, which matches the
actual usage throughout the codebase, removing the need for helpers.
There are still some questions around the way events are handled in the
shim. Right now, we've preserved some of the existing bugs which may
require more extensive changes to resolve correctly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
What started out as a simple PR to remove the "Readonly" column became an
adventure to add a proper type for a "View" snapshot. The short story here is
that we now get the following output:
```
$ sudo ctr snapshot ls
ID PARENT KIND
sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3 Committed
testing4 sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3 Active
```
In pursuing this output, it was found that the idea of having "readonly" as an
attribute on all snapshots was redundant. For committed, they are always
readonly, as they are not accessible without an active snapshot. For active
snapshots that were views, we'd have to check the type before interpreting
"readonly". With this PR, this is baked fully into the kind of snapshot. When
`Snapshotter.View` is called, the kind of snapshot is `KindView`, and the
storage system reflects this end to end.
Unfortunately, this will break existing users. There is no migration, so they
will have to wipe `/var/lib/containerd` and recreate everything. However, this
is deemed worthwhile at this point, as we won't have to judge validity of the
"Readonly" field when new snapshot types are added.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Debug address in defaultConfig() doesn't have to be a hardcoded string,
instead it can be const var from package server, which is also a
platform dependent const. So it would be better to use
server.DefaultDebugAddress here.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Mounting as MS_SLAVE here breaks use cases which want to use
rootPropagation=shared in order to expose mounts to the host (and other
containers binding the same subtree), mounting as e.g. MS_SHARED is pointless
in this context so just remove.
Having done this we also need to arrange to manually clean up the mounts on
delete, so do so.
Note that runc will also setup root as required by rootPropagation, defaulting
to MS_PRIVATE.
Fixes#1132.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This changeset:
- adds `mount` subcommand to `ctr snapshot`
- adds `snapshot-name` flag for specifying target snapshot name in both `mount`
and `prepare` snapshot subcommands
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Signed-off-by: rajasec <rajasec79@gmail.com>
Updating the usage and errors for ctr run command
Signed-off-by: rajasec <rajasec79@gmail.com>
Updating the usage of run command
Signed-off-by: rajasec <rajasec79@gmail.com>
Reverting back the imports
Signed-off-by: rajasec <rajasec79@gmail.com>
Rather than make a large PR, we can move parts of the dist commands over
piece by piece. This first step moves over the images command. Others
will follow.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To make the protobuild tool broadly useful, it has been broken out into
a separate project. This PR replaces the command with a configuration
file.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Rather than using the more verbose `set-labels` command, we are changing
the command to set labels for various objects to `label`, as it can be
used as a verb. This matches changes in the content store labeling.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Update list content command to support filters
Add label subcommand to content in dist tool to update labels
Add uncompressed label on unpack
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
e.g. dist pull --snapshotter btrfs ...; ctr run --snapshotter btrfs ...
(empty string defaults for overlayfs)
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the RuntimeEvent super proto with enums into separate
runtime event protos to be inline with the other events that are output
by containerd.
This also renames the runtime events into Task* events.
Fixes#1071
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Similar to code in the Docker daemon and containerd 0.2.x. Even if we
have a better deployment model in containerd 1.0 seems reasonable to
have this same fix in the rare case that it bites someone using
containerd 1.0.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Move content status to list statuses and add single status
to interface.
Updates API to support list statuses and status
Updates snapshot key creation to be generic
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The primary feature we get with this PR is support for filters and
labels on the image metadata store. In the process of doing this, the
conventions for the API have been converged between containers and
images, providing a model for other services.
With images, `Put` (renamed to `Update` briefly) has been split into a
`Create` and `Update`, allowing one to control the behavior around these
operations. `Update` now includes support for masking fields at the
datastore-level across both the containers and image service. Filters
are now just string values to interpreted directly within the data
store. This should allow for some interesting future use cases in which
the datastore might use the syntax for more efficient query paths.
The containers service has been updated to follow these conventions as
closely as possible.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changeset adds `prepare` subcommand to `ctr snapshot` and removes
`prepare` from `dist rootfs` to keep the basic snapshot operation commands
together.
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Marshaling Container interface resulted in empty json. Use Container proto
struct to get proper container attributes.
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Allow plugins to be mapped and returned by their ID.
Add skip plugin to allow plugins to decide whether they should
be loaded.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Now that we have most of the services required for use with containerd,
it was found that common patterns were used throughout services. By
defining a central `errdefs` package, we ensure that services will map
errors to and from grpc consistently and cleanly. One can decorate an
error with as much context as necessary, using `pkg/errors` and still
have the error mapped correctly via grpc.
We make a few sacrifices. At this point, the common errors we use across
the repository all map directly to grpc error codes. While this seems
positively crazy, it actually works out quite well. The error conditions
that were specific weren't super necessary and the ones that were
necessary now simply have better context information. We lose the
ability to add new codes, but this constraint may not be a bad thing.
Effectively, as long as one uses the errors defined in `errdefs`, the
error class will be mapped correctly across the grpc boundary and
everything will be good. If you don't use those definitions, the error
maps to "unknown" and the error message is preserved.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Fix the behavior of removing snapshot on container delete.
Adds a flag to keep the snapshot if desired.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Runtime is not printed while container listing due to typo introduced
in #935.
This fixes the Typo.
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
This moves the shim's API and protos out of the containerd services
package and into the linux runtime package. This is because the shim is
an implementation detail of the linux runtime that we have and it is not
a containerd user facing api.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When using events, it was found to be fairly unwieldy with a number of
extra packages. For the most part, when interacting with the events
service, we want types of the same version of the service. This has been
accomplished by moving all events types into the events package.
In addition, several fixes to the way events are marshaled have been
included. Specifically, we defer to the protobuf type registration
system to assemble events and type urls, with a little bit sheen on top
of add a containerd.io oriented namespace.
This has resulted in much cleaner event consumption and has removed the
reliance on error prone type urls, in favor of concrete types.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Move existing snapshot command to archive subcommand of snapshot.
Add list command for listing snapshots.
Add usage command for showing snapshot disk usage.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>