Running the density tool will report Pss and Rss total and per container
values for shim memory usage. Values are reported in KB.
```bash
containerd-stress density --count 500
INFO[0000] pulling docker.io/library/alpine:latest
INFO[0000] generating spec from image
{"pss":421188,"rss":2439688,"pssPerContainer":842,"rssPerContainer":4879}
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Without this `ctr run` can fail with:
ctr: parent snapshot sha256:70798fd80095f40b41baa5d107fb61532bfe494d96313fea01e8fcbf4e8743ee does not exist: not found
My image was produced by buildkit, which doesn't unpack (I think this makes
sense since buildkit doesn't know if I am going to run the image or export/push
it etc).
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Adds a useful flag to `ctr` to enable joining any existing Linux
namespaces for any namespace types (network, pid, ipc, etc.) using the
existing With helper in the oci package.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This uses a simple `IsAbs` check to see if we are using an on disk path
for a unix socket vs an address since we do not prefix addresses with
`unix://` or `tcp://`.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
For missing required parameters adds error return before attempting any
actions to `ctr images` commands.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
To avoid having the shim hold on to too much memory, we've made a few
adjustments to favor more aggressive reclamation of memory from the
operating system. Typically, this would be negligible, on the order of a
few megabytes, but this is impactful when running several containers.
The first fix is to lower the threshold used to determine when to run
the garbage collector. The second runs `runtime/debug.FreeOSMemory` at a
regular interval.
Under test, this result in a sustained memory usage of around 3.7 MB.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changes the Windows runtime to use the snapshotter and differ
created layers, and updates the ctr commands to use the snapshotter and differ.
Signed-off-by: Darren Stahl <darst@microsoft.com>
This implements the Windows snapshotter and diff Apply function.
This allows for Windows layers to be created, and layers to be pulled
from the hub.
Signed-off-by: Darren Stahl <darst@microsoft.com>
Given these same exact functions are both now available in
opencontainers/runc (libcontainer/system) package, and we only use the
`SetSubreaper` today from the shim, there seems to be no reason for
duplication.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
To avoid buffer bloat in long running processes, we try to use buffer
pools where possible. This is meant to address shim memory usage issues,
but may not be the root cause.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This improves the exec support so that they can run along with the
normal stress tests. You don't have to pick exec stres or container
stress. They both run at the same time and report the different values.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After comtemplation, the complexity of the logging module system
outweighs its usefulness. This changeset removes the system and restores
lighter weight code paths. As a concession, we can always provide more
context when necessary to log messages to understand them without having
to fork the context for a certain set of calls.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This allows other packages and plugins to easily exec things without
racing with the reaper.
The reaper is mostly needed in the shim but can be removed in containerd
in favor of the `exec.Cmd` apis
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This linter checks for unnecessary type convertions.
Some convertions are whitelisted because their type is different
on 32bit platforms
Signed-off-by: Daniel Nephin <dnephin@gmail.com>
Preserves the order of the tree output between each execution. Slightly
refactored the behavior to be more "object oriented".
Signed-off-by: Stephen J Day <stephen.day@docker.com>
- Use lease API (previoisly, GC was not supported)
- Refactored interfaces for ease of future Docker v1 importer support
For usage, please refer to `ctr images import --help`.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This subreaper should always be turned on for containerd unless
explicitly needed for it to be off.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Could issues where when exec processes fail the wait block is not
released.
Second, you could not dump stacks if the reaper loop locks up.
Third, the publisher was not waiting on the correct pid.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The shim doesn't need massive concurrency and a bunch of CPUs to do its
job correctly. We can reduce the number of threads to save memory at
little cost to performance.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
By replacing grpc with ttrpc, we can reduce total memory runtime
requirements and binary size. With minimal code changes, the shim can
now be controlled by the much lightweight protocol, reducing the total
memory required per container.
When reviewing this change, take particular notice of the generated shim
code.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Synchronous image delete provides an option image delete to wait
until the next garbage collection deletes after an image is removed
before returning success to the caller.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Add garbage collection as a background process and policy
configuration for configuring when to run garbage collection.
By default garbage collection will run when deletion occurs
and no more than 20ms out of every second.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The binary name used for executing "containerd publish" was hard-coded
in the shim code, and hence it did not work with customized daemon
binary name. (e.g. `docker-containerd`)
This commit allows specifying custom daemon binary via `containerd-shim
-containerd-binary ...`.
The daemon invokes this command with `os.Executable()` path.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Currently the output for a non-existent image reference and a valid
image reference is exactly the same on `ctr images remove`. Instead of
outputting the target ref input, if it is "not found" we should alert
the user in case of a mispelling, but continue not to make it a failure
for the command (given it supports multiple ref entries)
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This patch changes the output of `ctr version` to align version and revision.
It also changes `.Printf()` to `.Println()`, to make the code slightly easier
to read.
Before this change:
$ ctr version
Client:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Server:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
With this patch applied:
$ ctr version
Client:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Server:
Version: v1.0.0-beta.2-132-g564600e.m
Revision: 564600ee79aefb0f24cbcecc90d4388bd0ea59de.m
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
To reduce the binary size of containerd, we no longer import the
`server` package for only a few defaults. This reduces the size of `ctr`
by 2MB. There are probably other gains elsewhere.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This keeps the semantics the same as the other commands to only list
containers, tasks, images by calling the list subcommand.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows a project to have a TEMPLATE file in the root of the repo to
be used with the release tool. If they don't have this file and did not
specify a custom file then it will use the compiled in template in the
release tool.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow a user provided name for the checkpoint as well as a default
generated name for the checkpoint image.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows one to manage the checkpoints by using the `ctr image`
command.
The image is created with label "containerd.io/checkpoint". By
default, it is not included in the output of `ctr images ls`.
We can list the images by using the following command:
$ ctr images ls labels.containerd.\"io/checkpoint\"==true
Fixes#1026
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
This tool makes our standard release template easy to generate. It also
adds a few features like marking changed dependnencies for packages and
others to know what updated from the last release.
usage:
`containerd-release -n releases/v1.0.0-beta.2.toml`
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add differ options and package with interface.
Update optional values on diff interface to use options.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
With this change, we integrate all the plugin changes into the
introspection service.
All plugins can be listed with the following command:
```console
$ ctr plugins
TYPE ID PLATFORM STATUS
io.containerd.content.v1 content - ok
io.containerd.metadata.v1 bolt - ok
io.containerd.differ.v1 walking linux/amd64 ok
io.containerd.grpc.v1 containers - ok
io.containerd.grpc.v1 content - ok
io.containerd.grpc.v1 diff - ok
io.containerd.grpc.v1 events - ok
io.containerd.grpc.v1 healthcheck - ok
io.containerd.grpc.v1 images - ok
io.containerd.grpc.v1 namespaces - ok
io.containerd.snapshotter.v1 btrfs linux/amd64 error
io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
io.containerd.grpc.v1 snapshots - ok
io.containerd.monitor.v1 cgroups linux/amd64 ok
io.containerd.runtime.v1 linux linux/amd64 ok
io.containerd.grpc.v1 tasks - ok
io.containerd.grpc.v1 version - ok
```
There are few things to note about this output. The first is that it is
printed in the order in which plugins are initialized. This useful for
debugging plugin initialization problems. Also note that even though the
introspection GPRC api is a itself a plugin, it is not listed. This is
because the plugin takes a snapshot of the initialization state at the
end of the plugin init process. This allows us to see errors from each
plugin, as they happen. If it is required to introspect the existence of
the introspection service, we can make modifications to include it in
the future.
The last thing to note is that the btrfs plugin is in an error state.
This is a common state for containerd because even though we load the
plugin, most installations aren't on top of btrfs and the plugin cannot
be used. We can actually view this error using the detailed view with a
filter:
```console
$ ctr plugins --detailed id==btrfs
Type: io.containerd.snapshotter.v1
ID: btrfs
Platforms: linux/amd64
Exports:
root /var/lib/containerd/io.containerd.snapshotter.v1.btrfs
Error:
Code: Unknown
Message: path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter
```
Along with several other values, this is a valuable tool for evaluating the
state of components in containerd.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
By default, the generated spec will place containers in cgroups by their
ids, we need to use the namespace as the cgroup root to avoid
containers with the same name being placed in the same cgroup.
```
11:perf_event:/to/redis
10:freezer:/to/redis
9:memory:/to/redis
8:devices:/to/redis
7:net_cls,net_prio:/to/redis
6:pids:/to/redis
5:hugetlb:/to/redis
4:cpuset:/to/redis
3:blkio:/to/redis
2:cpu,cpuacct:/to/redis
1:name=systemd:/to/redis
11:perf_event:/te/redis
10:freezer:/te/redis
9:memory:/te/redis
8:devices:/te/redis
7:net_cls,net_prio:/te/redis
6:pids:/te/redis
5:hugetlb:/te/redis
4:cpuset:/te/redis
3:blkio:/te/redis
2:cpu,cpuacct:/te/redis
1:name=systemd:/te/redis
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This makes sure the client is always in sync with the server before
performing any type of operations on the container metadata.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The `Check` function returns information about an image's content components
over a content provider. From this information, one can tell which content is
required, present or missing to run an image.
The utility can be demonstrated with the `check` command:
```console
$ ctr images check
REF TYPE DIGEST STATUS SIZE
docker.io/library/alpine:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:f006ecbb824d87947d0b51ab8488634bf69fe4094959d935c0c103f4820a417d incomplete (1/2) 1.5 KiB/1.9 MiB
docker.io/library/postgres:latest application/vnd.docker.distribution.manifest.v2+json sha256:2f8080b9910a8b4f38ff5a55a82e77cb43d88bdbb16d723c71d18493590832e9 complete (13/13) 99.3 MiB/99.3 MiB
docker.io/library/redis:alpine application/vnd.docker.distribution.manifest.v2+json sha256:e633cded055a94202e4ccccb8125b7f383cd6ee56527ab890db643383a2647dd incomplete (6/7) 8.1 MiB/10.0 MiB
docker.io/library/ubuntu:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:60f835698ea19e8d9d3a59e68fb96fb35bc43e745941cb2ea9eaf4ba3029ed8a unavailable (0/?) 0.0 B/?
docker.io/trollin/busybox:latest application/vnd.docker.distribution.manifest.list.v2+json sha256:54a6424f7a2d5f4f27b3d69e5f9f2bc25fe9087f0449d3cb4215db349f77feae complete (2/2) 699.9 KiB/699.9 KiB
```
The above shows us that we have two incomplete images and one that is
unavailable. The incomplete images are those that we know the complete
size of all content but some are missing. "Unavailable" means that the
check could not get enough information about the image to get its full
size.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The SIGUNUSED constant was removed from golang.org/x/sys/unix in
https://go-review.googlesource.com/61771 as it is also removed from the
respective glibc headers.
This means the command
ctr tasks kill SIGUNUSED ...
will no longer work. However, the same effect can be achieved with
ctr tasks kill SIGSYS ...
as SIGSYS has the same value as SIGUNUSED used to have.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Fixes pulling of multi-arch images by limiting the expansion
of the index by filtering to the current default platform.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This handles signals first thing on boot so that plugins are able to
boot with the reaper enabled.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Content commit is updated to take in a context, allowing
content to be committed within the same context the writer
was in. This is useful when commit may be able to use more
context to complete the action rather than creating its own.
An example of this being useful is for the metadata implementation
of content, having a context allows tests to fully create
content in one database transaction by making use of the context.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The labels can be very long (e.g. cri-containerd stores a large JSON metadata
blob as `io.cri-containerd.container.metadata`) which renders the output
useless due to all the line wrapping etc.
The information is still available in `ctr containers info «name»`.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This also fix the type used for RuncOptions.SystemCgroup, hence introducing
an API break.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Fixes#1431
This adds KillOpts so that a client can specify when they want to kill a
single process or all the processes inside a container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
In order to do more advanced spec generation with images, snapshots,
etc, we need to inject the context and client into the spec generation
code.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After the rework of server-side defaults, the `ctr snapshot` command
stopped working due to no default snapshotter.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Instead of requiring callers to read the struct fields to check for an
error, provide the exit results via a function instead which is more
natural.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
In all of the examples, its recommended to call `Wait()` before starting
a process/task.
Since `Wait()` is a blocking call, this means it must be called from a
goroutine like so:
```go
statusC := make(chan uint32)
go func() {
status, err := task.Wait(ctx)
if err != nil {
// handle async err
}
statusC <- status
}()
task.Start(ctx)
<-statusC
```
This means there is a race here where there is no guarentee when the
goroutine is going to be scheduled, and even a bit more since this
requires an RPC call to be made.
In addition, this code is very messy and a common pattern for any caller
using Wait+Start.
Instead, this changes `Wait()` to use an async model having `Wait()`
return a channel instead of the code itself.
This ensures that when `Wait()` returns that the client has a handle on
the event stream (already made the RPC request) before returning and
reduces any sort of race to how the stream is handled by grpc since we
can't guarentee that we have a goroutine running and blocked on
`Recv()`.
Making `Wait()` async also cleans up the code in the caller drastically:
```go
statusC, err := task.Wait(ctx)
if err != nil {
return err
}
task.Start(ctx)
status := <-statusC
if status.Err != nil {
return err
}
```
No more spinning up goroutines and more natural error
handling for the caller.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This adds null IO option for efficient handling of IO.
It provides a container directly with `/dev/null` and does not require
any io.Copy within the shim whenever a user does not want the IO of the
container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds a `stress` binary to help stress test containerd. It is
different from a benchmarking tool as it only gives a simple summary at
the end.
It is built to run long, multi hour/day stress tests across builds of
containerd.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The argument order, naming and behavior of the snapshots command didn't
really follow any of the design constraints or conventions of the
`Snapshotter` interface. This brings the command into line with that
interface definition.
The `snapshot archive` command has been removed as it requires more
thought on design to correctly emit diffs.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
After some analysis, it was found that Content.Reader was generally
redudant to an io.ReaderAt. This change removes `Content.Reader` in
favor of a `Content.ReaderAt`. In general, `ReaderAt` can perform better
over interfaces with indeterminant latency because it avoids remote
state for reads. Where a reader is required, a helper is provided to
convert it into an `io.SectionReader`.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall with package
golang.org/x/sys/{unix,windows} where applicable.
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes for syscall wrappers
without having to use a new go version.
Errno, Signal and SysProcAttr aren't changed as they haven't been
implemented in x/sys/. Stat_t from syscall is used if standard library
packages (e.g. os) require it. syscall.ENOTSUP, syscall.SIGKILL and
syscall.SIGTERM are used for cross-platform files.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This sets the subreaper to true in the default linux config as the
common usecase is to not run containerd as pid 1.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Snapshotters for run must be created with requested snapshotter.
The order of the options is important to ensure that the snapshotter
is set before the snapshots are created.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This changes Wait() from returning an error whenever you call wait on a
stopped process/task to returning the exit status from the process.
This also adds the exit status to the Status() call on a process/task so
that a user can Wait(), check status, then cancel the wait to avoid
races in event handling.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This splits up the create and start of an exec process in the shim to
have two separate steps like the initial process. This will allow
better state reporting for individual process along with a more robust
wait for execs.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
To ensure consistent fieldpath matching for events, we generate the
fieldpath matching using protobuf definitions. This is done through a
plugin called "fieldpath" that defines a `Field` method for each type
with the plugin enabled. Generated code handles top-level envelope
fields, as well as deferred serialization for matching any types.
In practice, this means that we can cheaply match events on `topic` and
`namespace`. If we want to match on attributes within the event, we can
use the `event` prefix to address these fields. For example, the
following will match all envelopes that have a field named
`container_id` that has the value `testing`:
```
ctr events "event.container_id==testing"
```
The above will decode the underlying event and check that particular
field. Accordingly, if only `topic` or `namespace` is used, the event
will not be decoded and only match on the envelope.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This change further plumbs the components required for implementing
event filters. Specifically, we now have the ability to filter on the
`topic` and `namespace`.
In the course of implementing this functionality, it was found that
there were mismatches in the events API that created extra serialization
round trips. A modification to `typeurl.MarshalAny` and a clear
separation between publishing and forwarding allow us to avoid these
serialization issues.
Unfortunately, this has required a few tweaks to the GRPC API, so this
is a breaking change. `Publish` and `Forward` have been clearly separated in
the GRPC API. `Publish` honors the contextual namespace and performs
timestamping while `Forward` simply validates and forwards. The behavior
of `Subscribe` is to propagate events for all namespaces unless
specifically filtered (and hence the relation to this particular change.
The following is an example of using filters to monitor the task events
generated while running the [bucketbench tool](https://github.com/estesp/bucketbench):
```
$ ctr events 'topic~=/tasks/.+,namespace==bb'
...
2017-07-28 22:19:51.78944874 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-6-8","pid":25889}
2017-07-28 22:19:51.791893688 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-4-8","pid":25882}
2017-07-28 22:19:51.792608389 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-2-9","pid":25860}
2017-07-28 22:19:51.793035217 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-5-6","pid":25869}
2017-07-28 22:19:51.802659622 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-0-7","pid":25877}
2017-07-28 22:19:51.805192898 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-3-6","pid":25856}
2017-07-28 22:19:51.832374931 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-8-6","id":"bb-ctr-8-6","pid":25864,"exited_at":"2017-07-28T22:19:51.832013043Z"}
2017-07-28 22:19:51.84001249 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-2-9","id":"bb-ctr-2-9","pid":25860,"exited_at":"2017-07-28T22:19:51.839717714Z"}
2017-07-28 22:19:51.840272635 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-7-6","id":"bb-ctr-7-6","pid":25855,"exited_at":"2017-07-28T22:19:51.839796335Z"}
...
```
In addition to the events changes, we now display the namespace origin
of the event in the cli tool.
This will be followed by a PR to add individual field filtering for the
events API for each event type.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Export as a tar (Note: "-" can be used for stdout):
$ ctr images export /tmp/oci-busybox.tar docker.io/library/busybox:latest
Import a tar (Note: "-" can be used for stdin):
$ ctr images import foo/new:latest /tmp/oci-busybox.tar
Note: media types are not converted at the moment: e.g.
application/vnd.docker.image.rootfs.diff.tar.gzip
-> application/vnd.oci.image.layer.v1.tar+gzip
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
e.g. ctr run -t --rm --rootfs /tmp/busybox-rootfs foo /bin/sh
(--rm removes the container but does not remove rootfs dir, of course)
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
In the course of setting out to add filters and address some cleanup, it
was found that we had a few problems in the events subsystem that needed
addressing before moving forward.
The biggest change was to move to the more standard terminology of
publish and subscribe. We make this terminology change across the Go
interface and the GRPC API, making the behavior more familier. The
previous system was very context-oriented, which is no longer required.
With this, we've removed a large amount of dead and unneeded code. Event
transactions, context storage and the concept of `Poster` is gone. This
has been replaced in most places with a `Publisher`, which matches the
actual usage throughout the codebase, removing the need for helpers.
There are still some questions around the way events are handled in the
shim. Right now, we've preserved some of the existing bugs which may
require more extensive changes to resolve correctly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
What started out as a simple PR to remove the "Readonly" column became an
adventure to add a proper type for a "View" snapshot. The short story here is
that we now get the following output:
```
$ sudo ctr snapshot ls
ID PARENT KIND
sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3 Committed
testing4 sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3 Active
```
In pursuing this output, it was found that the idea of having "readonly" as an
attribute on all snapshots was redundant. For committed, they are always
readonly, as they are not accessible without an active snapshot. For active
snapshots that were views, we'd have to check the type before interpreting
"readonly". With this PR, this is baked fully into the kind of snapshot. When
`Snapshotter.View` is called, the kind of snapshot is `KindView`, and the
storage system reflects this end to end.
Unfortunately, this will break existing users. There is no migration, so they
will have to wipe `/var/lib/containerd` and recreate everything. However, this
is deemed worthwhile at this point, as we won't have to judge validity of the
"Readonly" field when new snapshot types are added.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Debug address in defaultConfig() doesn't have to be a hardcoded string,
instead it can be const var from package server, which is also a
platform dependent const. So it would be better to use
server.DefaultDebugAddress here.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Mounting as MS_SLAVE here breaks use cases which want to use
rootPropagation=shared in order to expose mounts to the host (and other
containers binding the same subtree), mounting as e.g. MS_SHARED is pointless
in this context so just remove.
Having done this we also need to arrange to manually clean up the mounts on
delete, so do so.
Note that runc will also setup root as required by rootPropagation, defaulting
to MS_PRIVATE.
Fixes#1132.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
This changeset:
- adds `mount` subcommand to `ctr snapshot`
- adds `snapshot-name` flag for specifying target snapshot name in both `mount`
and `prepare` snapshot subcommands
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Signed-off-by: rajasec <rajasec79@gmail.com>
Updating the usage and errors for ctr run command
Signed-off-by: rajasec <rajasec79@gmail.com>
Updating the usage of run command
Signed-off-by: rajasec <rajasec79@gmail.com>
Reverting back the imports
Signed-off-by: rajasec <rajasec79@gmail.com>
Rather than make a large PR, we can move parts of the dist commands over
piece by piece. This first step moves over the images command. Others
will follow.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To make the protobuild tool broadly useful, it has been broken out into
a separate project. This PR replaces the command with a configuration
file.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Rather than using the more verbose `set-labels` command, we are changing
the command to set labels for various objects to `label`, as it can be
used as a verb. This matches changes in the content store labeling.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Update list content command to support filters
Add label subcommand to content in dist tool to update labels
Add uncompressed label on unpack
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
e.g. dist pull --snapshotter btrfs ...; ctr run --snapshotter btrfs ...
(empty string defaults for overlayfs)
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the RuntimeEvent super proto with enums into separate
runtime event protos to be inline with the other events that are output
by containerd.
This also renames the runtime events into Task* events.
Fixes#1071
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Similar to code in the Docker daemon and containerd 0.2.x. Even if we
have a better deployment model in containerd 1.0 seems reasonable to
have this same fix in the rare case that it bites someone using
containerd 1.0.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Move content status to list statuses and add single status
to interface.
Updates API to support list statuses and status
Updates snapshot key creation to be generic
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The primary feature we get with this PR is support for filters and
labels on the image metadata store. In the process of doing this, the
conventions for the API have been converged between containers and
images, providing a model for other services.
With images, `Put` (renamed to `Update` briefly) has been split into a
`Create` and `Update`, allowing one to control the behavior around these
operations. `Update` now includes support for masking fields at the
datastore-level across both the containers and image service. Filters
are now just string values to interpreted directly within the data
store. This should allow for some interesting future use cases in which
the datastore might use the syntax for more efficient query paths.
The containers service has been updated to follow these conventions as
closely as possible.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changeset adds `prepare` subcommand to `ctr snapshot` and removes
`prepare` from `dist rootfs` to keep the basic snapshot operation commands
together.
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Marshaling Container interface resulted in empty json. Use Container proto
struct to get proper container attributes.
Signed-off-by: Sunny Gogoi <me@darkowlzz.space>
Allow plugins to be mapped and returned by their ID.
Add skip plugin to allow plugins to decide whether they should
be loaded.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Now that we have most of the services required for use with containerd,
it was found that common patterns were used throughout services. By
defining a central `errdefs` package, we ensure that services will map
errors to and from grpc consistently and cleanly. One can decorate an
error with as much context as necessary, using `pkg/errors` and still
have the error mapped correctly via grpc.
We make a few sacrifices. At this point, the common errors we use across
the repository all map directly to grpc error codes. While this seems
positively crazy, it actually works out quite well. The error conditions
that were specific weren't super necessary and the ones that were
necessary now simply have better context information. We lose the
ability to add new codes, but this constraint may not be a bad thing.
Effectively, as long as one uses the errors defined in `errdefs`, the
error class will be mapped correctly across the grpc boundary and
everything will be good. If you don't use those definitions, the error
maps to "unknown" and the error message is preserved.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Fix the behavior of removing snapshot on container delete.
Adds a flag to keep the snapshot if desired.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Runtime is not printed while container listing due to typo introduced
in #935.
This fixes the Typo.
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
This moves the shim's API and protos out of the containerd services
package and into the linux runtime package. This is because the shim is
an implementation detail of the linux runtime that we have and it is not
a containerd user facing api.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When using events, it was found to be fairly unwieldy with a number of
extra packages. For the most part, when interacting with the events
service, we want types of the same version of the service. This has been
accomplished by moving all events types into the events package.
In addition, several fixes to the way events are marshaled have been
included. Specifically, we defer to the protobuf type registration
system to assemble events and type urls, with a little bit sheen on top
of add a containerd.io oriented namespace.
This has resulted in much cleaner event consumption and has removed the
reliance on error prone type urls, in favor of concrete types.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Move existing snapshot command to archive subcommand of snapshot.
Add list command for listing snapshots.
Add usage command for showing snapshot disk usage.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: update events package to include emitter and use envelope proto
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: add events service
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: enable events service and update ctr events to use events service
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
event listeners
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: helper func for emitting in services
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: improved cli for containers and tasks
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
create event envelope with poster
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: introspect event data to use for type url
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: use pb encoding; add event types
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument content and snapshot services with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument image service with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument namespace service with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: add namespace support
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: only send events from namespace requested from client
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: switch to go-events for broadcasting
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
It is unused since 4c1af8fdd8 ("Port ctr to use client") and leaving it
around will just tempt people into writing code with security holes.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
We need a separate API for handing the exit status and deletion of
Exec'd processes to make sure they are properly cleaned up within the
shim and daemon.
Fixes#973
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When using WithBlock() on the dialer, the connection timeout must fully
expire before any status is provided to the user about whether they can
even connect to the socket. For example, if the containerd socket is
root-owned and the user tries `dist images ls` without `sudo`, the
default is 30 sec. of "hang" before the command returns.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Replaced pull unpacker with boolean to call unpack.
Added unpack and target to image type.
Updated progress logic for pull.
Added list images to client.
Updated rootfs unpacker to use client.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
To support multi-tenancy, containerd allows the collection of metadata
and runtime objects within a heirarchical storage primitive known as
namespaces. Data cannot be shared across these namespaces, unless
allowed by the service. This allows multiple sets of containers to
managed without interaction between the clients that management. This
means that different users, such as SwarmKit, K8s, Docker and others can
use containerd without coordination. Through labels, one may use
namespaces as a tool for cleanly organizing the use of containerd
containers, including the metadata storage for higher level features,
such as ACLs.
Namespaces
Namespaces cross-cut all containerd operations and are communicated via
context, either within the Go context or via GRPC headers. As a general
rule, no features are tied to namespace, other than organization. This
will be maintained into the future. They are created as a side-effect of
operating on them or may be created manually. Namespaces can be labeled
for organization. They cannot be deleted unless the namespace is empty,
although we may want to make it so one can clean up the entirety of
containerd by deleting a namespace.
Most users will interface with namespaces by setting in the
context or via the `CONTAINERD_NAMESPACE` environment variable, but the
experience is mostly left to the client. For `ctr` and `dist`, we have
defined a "default" namespace that will be created up on use, but there
is nothing special about it. As part of this PR we have plumbed this
behavior through all commands, cleaning up context management along the
way.
Namespaces in Action
Namespaces can be managed with the `ctr namespaces` subcommand. They
can be created, labeled and destroyed.
A few commands can demonstrate the power of namespaces for use with
images. First, lets create a namespace:
```
$ ctr namespaces create foo mylabel=bar
$ ctr namespaces ls
NAME LABELS
foo mylabel=bar
```
We can see that we have a namespace `foo` and it has a label. Let's pull
an image:
```
$ dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s total: 0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```
Now, let's list the image:
```
$ dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
That looks normal. Let's list the images for the `foo` namespace and see
this in action:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```
Look at that! Nothing was pulled in the namespace `foo`. Let's do the
same pull:
```
$ CONTAINERD_NAMESPACE=foo dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.8 s total: 0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```
Wow, that was very snappy! Looks like we pulled that image into out
namespace but didn't have to download any new data because we are
sharing storage. Let's take a peak at the images we have in `foo`:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
Now, let's remove that image from `foo`:
```
$ CONTAINERD_NAMESPACE=foo dist images rm
docker.io/library/redis:latest
```
Looks like it is gone:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```
But, as we can see, it is present in the `default` namespace:
```
$ dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
What happened here? We can tell by listing the namespaces to get a
better understanding:
```
$ ctr namespaces ls
NAME LABELS
default
foo mylabel=bar
```
From the above, we can see that the `default` namespace was created with
the standard commands without the environment variable set. Isolating
the set of shared images while sharing the data that matters.
Since we removed the images for namespace `foo`, we can remove it now:
```
$ ctr namespaces rm foo
foo
```
However, when we try to remove the `default` namespace, we get an error:
```
$ ctr namespaces rm default
ctr: unable to delete default: rpc error: code = FailedPrecondition desc = namespace default must be empty
```
This is because we require that namespaces be empty when removed.
Caveats
- While most metadata objects are namespaced, containers and tasks may
exhibit some issues. We still need to move runtimes to namespaces and
the container metadata storage may not be fully worked out.
- Still need to migrate content store to metadata storage and namespace
the content store such that some data storage (ie images).
- Specifics of snapshot driver's relation to namespace needs to be
worked out in detail.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The implementations for the storage of metadata have been merged into a
single metadata package where they can share storage primitives and
techniques. The is a requisite for the addition of namespaces, which
will require a coordinated layout for records to be organized by
namespace.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This avoids issues with the various deferred error handlers in the event that
`err` is shadowed or named differently, which this function currently avoids
but which is an easy trap to fall into.
Since named return values are all or nothing we need to name the waitGroup too
and adjust the code to suite.
Thanks to Aaron Lehmann for the suggestion, see also
https://github.com/docker/swarmkit/pull/1965#discussion_r118137410
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Split resolver to only return a name with separate methods
for getting a fetcher and pusher. Add implementation for
push.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Working from feedback on the existing implementation, we have now
introduced a central metadata object to represent the lifecycle and pin
the resources required to implement what people today know as
containers. This includes the runtime specification and the root
filesystem snapshots. We also allow arbitrary labeling of the container.
Such provisions will bring the containerd definition of container closer
to what is expected by users.
The objects that encompass today's ContainerService, centered around the
runtime, will be known as tasks. These tasks take on the existing
lifecycle behavior of containerd's containers, which means that they are
deleted when they exit. Largely, there are no other changes except for
naming.
The `Container` object will operate purely as a metadata object. No
runtime state will be held on `Container`. It only informs the execution
service on what is required for creating tasks and the resources in use
by that container. The resources referenced by that container will be
deleted when the container is deleted, if not in use. In this sense,
users can create, list, label and delete containers in a similar way as
they do with docker today, without the complexity of runtime locks that
plagues current implementations.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This moves both the Mount type and mountinfo into a single mount
package.
This also opens up the root of the repo to hold the containerd client
implementation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update go-runc to 49b2a02ec1ed3e4ae52d30b54a291b75
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add shim to restore creation
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Keep checkpoint path in service
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add C/R to non-shim build
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Checkpoint rw and image
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container on bind checkpoints
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Return dump.log in error on checkpoint failure
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container for checkpoint
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update runc to 639454475cb9c8b861cc599f8bcd5c8c790ae402
For checkpoint into to work you need runc version
639454475cb9c8b861cc599f8bcd5c8c790ae402 + and criu 3.0 as this is what
I have been testing with.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Move restore behind create calls
This remove the restore RPCs in favor of providing the checkpoint
information to the `Create` calls of a container. If provided, the
container will be created/restored from the checkpoint instead of an
existing container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Regen protos after rebase
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds support for signalling a container process by pid.
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
make Ps more extensible
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
ps: windows support
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Update go-runc to master with portability fixes.
Subreaper only exists on Linux, and only Linux runs the shim in a
mount namespace.
With these changes the shim compiles on Darwin, which means the
whole build compiles without errors now.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Remove rootfs service in place of snapshot service. Adds
diff service for extracting and creating diffs. Diff
creation is not yet implemented. This service allows
pulling or creating images without needing root access to
mount. Additionally in the future this will allow containerd
to ensure extractions happen safely in a chroot if needed.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The split between provider and ingester was a long standing division
reflecting the client-side use cases. For the most part, we were
differentiating these for the algorithms that operate them, but it made
instantation and use of the types challenging. On the server-side, this
distinction is generally less important. This change unifies these types
and in the process we get a few benefits.
The first is that we now completely access the content store over GRPC.
This was the initial intent and we have now satisfied this goal
completely. There are a few issues around listing content and getting
status, but we resolve these with simple streaming and regexp filters.
More can probably be done to polish this but the result is clean.
Several other content-oriented methods were polished in the process of
unification. We have now properly seperated out the `Abort` method to
cancel ongoing or stalled ingest processes. We have also replaced the
`Active` method with a single status method.
The transition went extremely smoothly. Once the clients were updated to
use the new methods, every thing worked as expected on the first
compile.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This updates containerd to use the latest versions of cgroups, fifo,
console, and go-runc from the containerd org.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds pause and unpause to containerd's execution service and the
same commands to the `ctr` client.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Leave in btrfs by default, but add go build tags to exclude it.
`go build -tags containerd_no_btrfs` will leave that driver out.
As the current containerd/btrfs code needs link to libbtrfs*.so, but not
all distros provide it.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
This mainly fixes Linux vs generic Unix differences, with some
differences between Darwin and Freebsd (which are close bit not
identical). Should make fixing for other Unix platforms easier.
Note there are not yet `runc` equivalents for these platforms;
my current use case is image manipulation for the `moby` tool.
However there is interest in OCI runtime ports for both platforms.
Current status is that MacOS can build and run `ctr`, `dist`
and `containerd` and some operations are supported. FreeBSD 11
still needs some more fixes to continuity for extended attributes.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
This allows one to edit content in the content store with their favorite
editor. It is as simple as this:
```console
$ dist content edit sha256:58e1a1bb75db1b5a24a462dd5e2915277ea06438c3f105138f97eb53149673c4
```
The above will pop up your $EDITOR, where you can make changes to the content.
When you are done, save and the new version will be added to the content store.
The digest of the new content will be printed to stdout:
```console
sha256:247f30ac320db65f3314b63b908a3aeaac5813eade6cabc9198b5883b22807bc
```
We can then retrieve the content quite easily:
```console
$ dist content get sha256:247f30ac320db65f3314b63b908a3aeaac5813eade6cabc9198b5883b22807bc
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 1278,
"digest": "sha256:4a415e3663882fbc554ee830889c68a33b3585503892cc718a4698e91ef2a526"
},
"annotations": {},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 1905270,
"digest": "sha256:627beaf3eaaff1c0bc3311d60fb933c17ad04fe377e1043d9593646d8ae3bfe1"
}
]
}
```
In this case, an annotations field was added to the original manifest.
While this implementation is very simple, we can add all sorts of validation
and tooling to allow one to edit images inline. Coupled with declaring the
mediatype, we could return specific errors that can allow a user to craft
valid, working modifications to images for testing and profit.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Updates the filemode on the grpc socket to have group write
permission which is needed to perform GRPC. Additionally, ensure
the run directory has the specified group ownership and has group
read and enter permission.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This adds a config option to set the oom score for the containerd daemon
as well as automatically setting the oom score for the shim's lauched so
that they are not killed until the very end of an out of memory
condition.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
rather than automagically doing this, it is the user's responsibility to
review the output of `containerd config default` and create the config
themselves.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
when wanting to craft a custom config, but based on the default config,
add a route to output the containerd config to a tempfile.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
With this changeset, image store access is now moved to completely
accessible over GRPC. No clients manipulate the image store database
directly and the GRPC client is fully featured. The metadata database is
now managed by the daemon and access coordinated via services.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This adds very simple deletion of images by name. We still need to
consider the approach to handling image name, so this may change. For
the time being, it allows one to delete an image entry in the metadata
database.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
We need to set +x on the overlay dirs or after dropping from root to a
non-root user an eperm will happen on exec or other file access
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow usage of the experimental docker resolver as a package. There are
very few changes to the consuming code, demonstrating the effectiveness
of the abstraction. This move will allow future contributions to a more
featured resolver implementation.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
As a demonstration of the power of the visitor implementation, we now
report the image size in the `dist images` command. This is the size of
the packed resources as would be pushed into a remote. A similar method
could be added to calculate the unpacked size.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
With this changeset, we now have a proof of concept of end to end pull.
Up to this point, the relationship between subsystems has been somewhat
theoretical. We now leverage fetching, the snapshot drivers, the rootfs
service, image metadata and the execution service, validating the proposed
model for containerd. There are a few caveats, including the need to move some
of the access into GRPC services, but the basic components are there.
The first command we will cover here is `dist pull`. This is the analog
of `docker pull` and `git pull`. It performs a full resource fetch for
an image and unpacks the root filesystem into the snapshot drivers. An
example follows:
``` console
$ sudo ./bin/dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:4c8fb09e8d634ab823b1c125e64f0e1ceaf216025aa38283ea1b42997f1e8059: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:3b281f2bcae3b25c701d53a219924fffe79bdb74385340b73a539ed4020999c4: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:e4a35914679d05d25e2fccfd310fde1aa59ffbbf1b0b9d36f7b03db5ca0311b0: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4b7726832aec75f0a742266c7190c4d2217492722dfd603406208eaa902648d8: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:338a7133395941c85087522582af182d2f6477dbf54ba769cb24ec4fd91d728f: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:83f12ff60ff1132d1e59845e26c41968406b4176c1a85a50506c954696b21570: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:693502eb7dfbc6b94964ae66ebc72d3e32facd981c72995b09794f1e87bac184: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:622732cddc347afc9360b4b04b46c6f758191a1dc73d007f95548658847ee67e: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:19a7e34366a6f558336c364693df538c38307484b729a36fede76432789f084f: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.6 s total: 0.0 B (0.0 B/s)
INFO[0001] unpacking rootfs
```
Note that we haven't integrated rootfs unpacking into the status output, but we
pretty much have what is in docker today (:P). We can see the result of our pull
with the following:
```console
$ sudo ./bin/dist images
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:4c8fb09e8d634ab823b1c125e64f0e1ceaf216025aa38283ea1b42997f1e8059 1.8 kB
```
The above shows that we have an image called "docker.io/library/redis:latest"
mapped to the given digest marked with a specific format. We get the size of
the manifest right now, not the full image, but we can add more as we need it.
For the most part, this is all that is needed, but a few tweaks to the model
for naming may need to be added. Specifically, we may want to index under a few
different names, including those qualified by hash or matched by tag versions.
We can do more work in this area as we develop the metadata store.
The name shown above can then be used to run the actual container image. We can
do this with the following command:
```console
$ sudo ./bin/ctr run --id foo docker.io/library/redis:latest /usr/local/bin/redis-server
1:C 17 Mar 17:20:25.316 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/local/bin/redis-server /path/to/redis.conf
1:M 17 Mar 17:20:25.317 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.2.8 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 17 Mar 17:20:25.326 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 17 Mar 17:20:25.326 # Server started, Redis version 3.2.8
1:M 17 Mar 17:20:25.326 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 17 Mar 17:20:25.326 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 17 Mar 17:20:25.326 * The server is now ready to accept connections on port 6379
```
Wow! So, now we are running `redis`!
There are still a few things to work out. Notice that we have to specify the
command as part of the arguments to `ctr run`. This is because are not yet
reading the image config and converting it to an OCI runtime config. With the
base laid in this PR, adding such functionality should be straightforward.
While this is a _little_ messy, this is great progress. It should be easy
sailing from here.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
With this PR, we introduce the concept of image handlers. They support
walking a tree of image resource descriptors for doing various tasks
related to processing them. Handlers can be dispatched sequentially or
in parallel and can be stacked for various effects.
The main functionality we introduce here is parameterized fetch without
coupling format resolution to the process itself. Two important
handlers, `remotes.FetchHandler` and `image.ChildrenHandler` can be
composed to implement recursive fetch with full status reporting. The
approach can also be modified to filter based on platform or other
constraints, unlocking a lot of possibilities.
This also includes some light refactoring in the fetch command, in
preparation for submission of end to end pull.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The service can use the snapshotter directly to get the rootfs.
Removed debug line for mount response.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
This reuses the exiting shim code and services to let containerd run as
the reaper for all container processes without the use of a shim.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After receiving feedback during containerd summit walk through of the
pull POC, we found that the resolution flow for names was out of place.
We could see this present in awkward places where we were trying to
re-resolve whether something was a digest or a tag and extra retries to
various endpoints.
By centering this problem around, "what do we write in the metadata
store?", the following interface comes about:
```
Resolve(ctx context.Context, ref string) (name string, desc ocispec.Descriptor, fetcher Fetcher, err error)
```
The above takes an "opaque" reference (we'll get to this later) and
returns the canonical name for the object, a content description of the
object and a `Fetcher` that can be used to retrieve the object and its
child resources. We can write `name` into the metadata store, pointing
at the descriptor. Descisions about discovery, trust, provenance,
distribution are completely abstracted away from the pulling code.
A first response to such a monstrosity is "that is a lot of return
arguments". When we look at the actual, we can see that in practice, the
usage pattern works well, albeit we don't quite demonstrate the utility
of `name`, which will be more apparent later. Designs that allowed
separate resolution of the `Fetcher` and the return of a collected
object were considered. Let's give this a chance before we go
refactoring this further.
With this change, we introduce a reference package with helps for
remotes to decompose "docker-esque" references into consituent
components, without arbitrarily enforcing those opinions on the backend.
Utlimately, the name and the reference used to qualify that name are
completely opaque to containerd. Obviously, implementors will need to
show some candor in following some conventions, but the possibilities
are fairly wide. Structurally, we still maintain the concept of the
locator and object but the interpretation is up to the resolver.
For the most part, the `dist` tool operates exactly the same, except
objects can be fetched with a reference:
```
dist fetch docker.io/library/redis:latest
```
The above should work well with a running containerd instance. I
recommend giving this a try with `fetch-object`, as well. With
`fetch-object`, it is easy for one to better understand the intricacies
of the OCI/Docker image formats.
Ultimately, this serves the main purpose of the elusive "metadata
store".
Signed-off-by: Stephen J Day <stephen.day@docker.com>
With the rename of fetch to fetch-object, we now introduce the `fetch`
command. It will fetch all of the resources required for an image into
the content store. We'll still need to follow this up with metadata
registration but this is a good start.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To make using the `fetch-object` for demonstrations much easier, the
mediatypes are defaulted when a non-digest object identifier is
provided. We also add support for OCI mediatypes, although they are
mostly unavailable.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To allow us to differentiate from fetching an image, fetch a part of an
image and pulling an image, we now call the `fetch` command the
`fetch-object` command. We can now introduce a command that does the
complete image fetch without creating snapshots, allowing `pull` to
perform the entire process.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Allow deletion of content over the GRPC interface. For now, we are going
with a model that conducts reference management outside of the content
store, in the metadata store but this design is valid either way.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
When using the fetcher concurrently, the loop modifying the closed
`base` parameter was causing urls from different digests to be returned
randomly. We copy the the value and then modify it to make it work
correctly.
Luckily, we are using content addressable storage or this would have
been undetectable.
Signed-off-by: Stephen J Day <stephen.day@docker.com>