Extend the Applier interface's Apply method with an optional
options parameter.
For the container image encryption we intend to use the options
parameter to pass image decryption parameters ('dcparameters'),
which are primarily (privatte) keys, in form of a JSON document
under the map key '_dcparameters', and pass them to the Applier's
Apply() method. This helps us to access decryption keys and start
the pipeline with the layer decryption before the layer data are
unzipped and untarred.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
megacheck, gosimple and unused has been deprecated and subsumed by
staticcheck. And staticcheck also has been upgraded. we need to update
code for the linter issue.
close: #2945
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This changeset modifies the metadata store to allow one to set a
"content sharing policy" that defines how blobs are shared between
namespaces in the content store.
The default mode "shared" will make blobs available in all namespaces
once it is pulled into any namespace. The blob will be pulled into
the namespace if a writer is opened with the "Expected" digest that
is already present in the backend.
The alternative mode, "isolated" requires that clients prove they have
access to the content by providing all of the content to the ingest
before the blob is added to the namespace.
Both modes share backing data, while "shared" will reduce total
bandwidth across namespaces, at the cost of allowing access to any
blob just by knowing its digest.
Note: Most functional codes and changelog of this commit originate from
Stephen J Day <stephen.day@docker.com>, see
40455aade8Fixes#1713Fixes#2865
Signed-off-by: Eric Lin <linxiulei@gmail.com>
support checkpoint without committing a checkpoint dir into a
checkpoint image and restore without untar image into checkpoint
directory. support for both v1 and v2 runtime
Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Allows containerd.exe to run as a Windows service. eg
Register: `.\containerd.exe --register-service`
Start: `net start containerd`
...
Stop: `net stop containerd`
Unregister: `.\containerd.exe --unregister-service`
When running as a service, logs will go to the Windows application
event log.
Since there is no real action the user can do, these can safely be
informative that the underlying filesystem does not support a snapshot
plugin at boot.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The github.com/containerd/containerd/services/server has a lot of
dependencies, like content, snapshots services implementation and
docker-metrics.
For the client side, it uses the config struct from server package
to start up the containerd in background. It will import a lot of
useless packages which might be conflict with existing vendor's package.
It makes integration easier with single config package.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The local store could end up in a state where the writer is
closed but the reference is locked after a commit on an
existing object.
Cleans up Commit logic to always close the writer even after
an error occurs, guaranteeing the reference is unlocked after commit.
Adds a test to the content test suite to verify this behavior.
Updates the content store interface definitions to clarify the behavior.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Some images like `criu` will have extra libs that it requires. This
adds lib support via LD_LIBRARY_PATH and InstallOpts
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds a way for users to programatically install containerd binary
dependencies.
With runtime v2 and new shim's being built, it will be a challenge to
get those onto machines. Users would have to find the link, download,
place it in their path, yada yada yada.
With this functionality of a managed `/opt` directory, containerd can
use existing image and distribution infra. to get binarys, shims, etc
onto the system.
Configuration:
*default:* `/opt/containerd`
*containerd config:*
```toml
[plugins.opt]
path = "/opt/mypath"
```
Usage:
*code:*
```go
image, err := client.Pull(ctx, "docker.io/crosbymichael/runc:latest")
client.Install(ctx, image)
```
*ctr:*
```bash
ctr content fetch docker.io/crosbymichael/runc:latest
ctr install docker.io/crosbymichael/runc:latest
```
You can manage versions and see what is running via standard image
commands.
Images:
These images MUST be small and only contain binaries.
```Dockerfile
FROM scratch
Add runc /bin/runc
```
Containerd will only extract files in `/bin` of the image.
Later on, we can add support for `/lib`.
The code adds a service to manage an `/opt/containerd` directory and
provide that path to callers via the introspection service.
How to Test:
Delete runc from your system.
```bash
> sudo ctr run --rm docker.io/library/redis:alpine redis
ctr: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/default/redis/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown
> sudo ctr content fetch docker.io/crosbymichael/runc:latest
> sudo ctr install docker.io/crosbymichael/runc:latest
> sudo ctr run --rm docker.io/library/redis:alpine redis
1:C 01 Aug 15:59:52.864 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 Aug 15:59:52.864 # Redis version=4.0.10, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 Aug 15:59:52.864 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 01 Aug 15:59:52.866 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 01 Aug 15:59:52.866 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 01 Aug 15:59:52.866 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
1:M 01 Aug 15:59:52.870 * Running mode=standalone, port=6379.
1:M 01 Aug 15:59:52.870 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 01 Aug 15:59:52.870 # Server initialized
1:M 01 Aug 15:59:52.870 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 01 Aug 15:59:52.870 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 01 Aug 15:59:52.870 * Ready to accept connections
^C1:signal-handler (1533139193) Received SIGINT scheduling shutdown...
1:M 01 Aug 15:59:53.472 # User requested shutdown...
1:M 01 Aug 15:59:53.472 * Saving the final RDB snapshot before exiting.
1:M 01 Aug 15:59:53.484 * DB saved on disk
1:M 01 Aug 15:59:53.484 # Redis is now ready to exit, bye bye...
```
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Removes the start dependency on V1 runtimes in the TasksService for:
// +build windows_v2. For unix and windows (v1) this code remains to load all
v1 runtimes as expected.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This change allows implementations to resolve the location of the actual data
using OCI descriptor fields such as MediaType.
No OCI descriptor field is written to the store.
No change on gRPC API.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
This renames the runtime interface to PlatformRuntime to denote the
layer at which the runtime is being abstracted. This should be used to
abstract different platforms that vary greatly and do not have full
compat with OCI based binary runtimes.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows non-privileged users to use containerd. This is part of a
larger track of work integrating containerd into Cloudfoundry's garden
with support for rootless.
[#156343575]
Signed-off-by: Claudia Beresford <cberesford@pivotal.io>
Since Go 1.7, context is a standard package, superceding the
"x/net/context". Since Go 1.9, the latter only provides a few type
aliases from the former. Therefore, it makes sense to switch to the
standard package.
This commit was generated by the following script (with a couple of
minor fixups to remove extra changes done by goimports):
#!/bin/bash
if [ $# -ge 1 ]; then
FILES=$*
else
FILES=$(git ls-files \*.go | grep -vF ".pb.go" | grep -v
^vendor/)
fi
for f in $FILES; do
printf .
sed -i -e 's|"golang.org/x/net/context"$|"context"|' $f
goimports -w $f
awk ' /^$/ {e=1; next;}
/[[:space:]]"context"$/ {e=0;}
{if (e) {print ""; e=0}; print;}' < $f > $f.new && \
mv $f.new $f
goimports -w $f
done
echo
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Go 1.7, "context" is a standard package, superceding the
"x/net/context". Since Go 1.9, the latter only provides type aliases
from the former. Therefore, it makes sense to switch to the standard
package, and the change is not disruptive in any sense.
This commit deals with a few cases where both packages happened to be
imported by the same source file. A choice between "context" and
"gocontext" was made for each file in order to minimize the patch.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
We move from having our own generated version of the googleapis files to
an upstream version that is present in gogo. As part of this, we update
the protobuf package to 1.0 and make some corrections for slight
differences in the generated code.
The impact of this change is very low.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Update content ingests to use content from another namespace.
Ingests must be committed to make content available and the
client will see the sharing as an ingest which has already
been fully written to, but not completed.
Updated the database version to change the ingest record in
the database from a link key to an object with a link and
expected value. This expected value is used to indicate that
the content already exists and an underlying writer may
not yet exist.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This implements the Windows snapshotter and diff Apply function.
This allows for Windows layers to be created, and layers to be pulled
from the hub.
Signed-off-by: Darren Stahl <darst@microsoft.com>
Because tasks may be deleted while listing containers, we need to ignore
errors from state requests that are due to a closed error. All of these
get mapped to ErrNotFound, which can be used to filter the entries.
There may be a better fix that does a better job of keeping track of the
intended state of a backend task. The current condition of assuming that
a closed client is a shutdown task may be too naive.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This linter checks for unnecessary type convertions.
Some convertions are whitelisted because their type is different
on 32bit platforms
Signed-off-by: Daniel Nephin <dnephin@gmail.com>
The boltdb image store now manages its own transactions when
one is not provided, but allows the caller to pass in a
transaction through the context. This makes the image store
more similar to the content and snapshot stores. Additionally,
use the reference to the metadata database to mark the content
store as dirty after an image has been deleted. The deletion
of an image means a reference to a piece of content is gone
and therefore garbage collection should be run to check if
any resources can be cleaned up as a result.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Synchronous image delete provides an option image delete to wait
until the next garbage collection deletes after an image is removed
before returning success to the caller.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Add garbage collection as a background process and policy
configuration for configuring when to run garbage collection.
By default garbage collection will run when deletion occurs
and no more than 20ms out of every second.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
By defining a concrete, non-protobuf type for the events interface, we
can completely decouple it from the grpc packages that are expensive at
runtime. This does requires some allocation cost for converting between
types, but the saving for the size of the shim are worth it.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To avoid importing all of grpc when consuming events, the types of
events have been split in to a separate package. This should allow a
reduction in memory usage in cases where a package is consuming events
but not using the gprc service directly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The locks now retry on the backend side to prevent clients from having
to round trip on locks that might be momentarily held. This exposed some
timing errors in the updated_at fields for content ingest, so we've had
to move that to a separate file to export the monotonic go runtime
timestamps.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Since these are registered and the interface is what matters, these
Service types do not need to be exported.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Prevent checkpoints from getting garbage collected by
adding root labels to unreferenced checkpoint objects.
Mark checkpoints as gc roots.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Add differ options and package with interface.
Update optional values on diff interface to use options.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
With this change, we integrate all the plugin changes into the
introspection service.
All plugins can be listed with the following command:
```console
$ ctr plugins
TYPE ID PLATFORM STATUS
io.containerd.content.v1 content - ok
io.containerd.metadata.v1 bolt - ok
io.containerd.differ.v1 walking linux/amd64 ok
io.containerd.grpc.v1 containers - ok
io.containerd.grpc.v1 content - ok
io.containerd.grpc.v1 diff - ok
io.containerd.grpc.v1 events - ok
io.containerd.grpc.v1 healthcheck - ok
io.containerd.grpc.v1 images - ok
io.containerd.grpc.v1 namespaces - ok
io.containerd.snapshotter.v1 btrfs linux/amd64 error
io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
io.containerd.grpc.v1 snapshots - ok
io.containerd.monitor.v1 cgroups linux/amd64 ok
io.containerd.runtime.v1 linux linux/amd64 ok
io.containerd.grpc.v1 tasks - ok
io.containerd.grpc.v1 version - ok
```
There are few things to note about this output. The first is that it is
printed in the order in which plugins are initialized. This useful for
debugging plugin initialization problems. Also note that even though the
introspection GPRC api is a itself a plugin, it is not listed. This is
because the plugin takes a snapshot of the initialization state at the
end of the plugin init process. This allows us to see errors from each
plugin, as they happen. If it is required to introspect the existence of
the introspection service, we can make modifications to include it in
the future.
The last thing to note is that the btrfs plugin is in an error state.
This is a common state for containerd because even though we load the
plugin, most installations aren't on top of btrfs and the plugin cannot
be used. We can actually view this error using the detailed view with a
filter:
```console
$ ctr plugins --detailed id==btrfs
Type: io.containerd.snapshotter.v1
ID: btrfs
Platforms: linux/amd64
Exports:
root /var/lib/containerd/io.containerd.snapshotter.v1.btrfs
Error:
Code: Unknown
Message: path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter
```
Along with several other values, this is a valuable tool for evaluating the
state of components in containerd.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Updates metadata plugin to require content and
snapshotter plugins be loaded and initializes with
those plugins, keeping the metadata database structure
static after initialization. Service plugins now only
require metadata plugin access snapshotter or content
stores through metadata, which was already required
behavior of the services.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Fixes#1497
This sets the namespace on the context when deleting a namespace so that
the publish event does not fail. Use the same namespace as you are
deleting for the context.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Matching support is now implemented in the platforms package. The
`Parse` function now returns a matcher object that can be used to
match OCI platform specifications. We define this as an interface to
allow the creation of helpers oriented around platform selection.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This field allows a client to store specialized information in the
container metadata rather than having to store this itself and keep
the data in sync with containerd.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
ref: #1464
This tries to solve issues with races around process state. First it
adds the process mutex around the state call so that any state changes,
deletions, etc will be handled in order.
Second, for IsNoExist errors from the runtime, return a stopped state if
a process has been removed from the underlying OCI runtime but not from
the shim yet. This shouldn't happen with the lock from above but its
hare to verify this issue.
Third, handle shim disconnections and return an ErrNotFound.
Forth, don't abort returning all tasks if one task is unable to return
its state.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Content commit is updated to take in a context, allowing
content to be committed within the same context the writer
was in. This is useful when commit may be able to use more
context to complete the action rather than creating its own.
An example of this being useful is for the metadata implementation
of content, having a context allows tests to fully create
content in one database transaction by making use of the context.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
In order to enforce strict handling of snapshotter values on the
container object, the defaults have been moved to the client side. This
ensures that we correctly qualify the snapshotter under use when from
the container at the time it was created, rather than possibly losing
the metadata on a change of default.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Updates the differ service to support calling and configuring
multiple differs. The differs are configured as an ordered list
of differs which will each be attempting until a supported differ
is called.
Additionally a not supported error type was added to allow differs
to be selective of whether the differ arguments are supported by
the differ. This error type corresponds to the GRPC unimplemented error.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Add commit options which allow for setting labels on commit.
Prevents potential race between garbage collector reading labels
after commit and labels getting set.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
After some analysis, it was found that Content.Reader was generally
redudant to an io.ReaderAt. This change removes `Content.Reader` in
favor of a `Content.ReaderAt`. In general, `ReaderAt` can perform better
over interfaces with indeterminant latency because it avoids remote
state for reads. Where a reader is required, a helper is provided to
convert it into an `io.SectionReader`.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This changes Wait() from returning an error whenever you call wait on a
stopped process/task to returning the exit status from the process.
This also adds the exit status to the Status() call on a process/task so
that a user can Wait(), check status, then cancel the wait to avoid
races in event handling.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>