Use single instance of content store instead of
creating new one for each collection. Using new
instance and wrapping causes failures.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Ensures that all callers and the garbage collector are using
the same lock instances to prevent cleanup of objects
during creation.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Marks and sweeps unreferenced objects.
Add snapshot cleanup to metadata.
Add content garbage collection
Add dirty flags for snapshotters and content store which
are set on deletion and used during the next garbage collection.
Cleanup content store backend when content metadata is removed.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Updates metadata plugin to require content and
snapshotter plugins be loaded and initializes with
those plugins, keeping the metadata database structure
static after initialization. Service plugins now only
require metadata plugin access snapshotter or content
stores through metadata, which was already required
behavior of the services.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Adds back links from parent to children in order to prevent
deletion of a referenced snapshot in a namespace.
Avoid removing snapshot during metadata delete to
prevent shared namespaces from being mistakenly deleted.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
To allow for updating extensions without collisions, we have moved to
using a map type that can be explicitly selected via the field path for
updates. This ensures that multiple parties can operate on their
extensions without stepping on each other's toes or incurring an
inordinate number of round trips.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This field allows a client to store specialized information in the
container metadata rather than having to store this itself and keep
the data in sync with containerd.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Content commit is updated to take in a context, allowing
content to be committed within the same context the writer
was in. This is useful when commit may be able to use more
context to complete the action rather than creating its own.
An example of this being useful is for the metadata implementation
of content, having a context allows tests to fully create
content in one database transaction by making use of the context.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The testsuite alters global state by setting the umask, avoid
running the testsuite in parallel and move umask manipulation
to the test suite level to individual tests may run in parallel.
Added better error messaging and handling.
Removed reliance on testing object for handling cleanup failure.
When a cleanup error occurred, it would fail the test but the log
would get skipped. With this change the failure will show up for
the running test.
Update test unmounting to set the detach flag, avoiding races with
btrfs which may see the device as busy when attempting to unmount.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This fixes a few bugs in the container store related to reading and
writing fields. Specifically, on update, the full field set wasn't being
returned to the caller, making it appear that the store was corrupted.
We now return the correctly updated field and store the missing field
that was omitted in the original implementation. In course, we also have
defined the update semantics of each field, as well as whether or not
they are required.
The big addition here is really the container metadata testsuite. It
covers listing, filtering, creates, updates and deletes in a vareity of
scenarios.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This could render tasks for a container unresolvalbe. If there is a use
case for changing the runtime of a container, we should think it through
carefully.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Add commit options which allow for setting labels on commit.
Prevents potential race between garbage collector reading labels
after commit and labels getting set.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
After some analysis, it was found that Content.Reader was generally
redudant to an io.ReaderAt. This change removes `Content.Reader` in
favor of a `Content.ReaderAt`. In general, `ReaderAt` can perform better
over interfaces with indeterminant latency because it avoids remote
state for reads. Where a reader is required, a helper is provided to
convert it into an `io.SectionReader`.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Moves label and timestamp bolt functions to subpackage
for use outside the metadata package without importing
metadata package.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The snapshot test suite is designed to run against the snapshotter
interface, run the test suite for metadata and client implementations
of the snapshotter interface.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Because the lock on an ingest ref being held regardless of whether a
writer was in use, resuming an existing ingest proved impossible. We now
defer writer locking to the content store backend, where the lock will
be released automatically on closing the writer or on restarting
containerd.
There are still cases where a writer can be abandoned but not closed,
leaving an active ingest, but this is extremely rare and can be resolved
with a daemon restart.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Add content test suite with test for writer.
Update fs and metadata implementations to use test suite.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
After review, there are cases where having common requirements for
namespaces and identifiers creates contention between applications. One
example is that it is nice to have namespaces comply with domain name
requirement, but that does not allow underscores, which are required for
certain identifiers.
The namespaces validation has been reverted to be in line with RFC 1035.
Existing identifiers has been modified to allow simply alpha-numeric
identifiers, while limiting adjacent separators.
We may follow up tweaks for the identifier charset but this split should
remove the hard decisions.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Update list content command to support filters
Add label subcommand to content in dist tool to update labels
Add uncompressed label on unpack
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Move content status to list statuses and add single status
to interface.
Updates API to support list statuses and status
Updates snapshot key creation to be generic
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The primary feature we get with this PR is support for filters and
labels on the image metadata store. In the process of doing this, the
conventions for the API have been converged between containers and
images, providing a model for other services.
With images, `Put` (renamed to `Update` briefly) has been split into a
`Create` and `Update`, allowing one to control the behavior around these
operations. `Update` now includes support for masking fields at the
datastore-level across both the containers and image service. Filters
are now just string values to interpreted directly within the data
store. This should allow for some interesting future use cases in which
the datastore might use the syntax for more efficient query paths.
The containers service has been updated to follow these conventions as
closely as possible.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Now that we have most of the services required for use with containerd,
it was found that common patterns were used throughout services. By
defining a central `errdefs` package, we ensure that services will map
errors to and from grpc consistently and cleanly. One can decorate an
error with as much context as necessary, using `pkg/errors` and still
have the error mapped correctly via grpc.
We make a few sacrifices. At this point, the common errors we use across
the repository all map directly to grpc error codes. While this seems
positively crazy, it actually works out quite well. The error conditions
that were specific weren't super necessary and the ones that were
necessary now simply have better context information. We lose the
ability to add new codes, but this constraint may not be a bad thing.
Effectively, as long as one uses the errors defined in `errdefs`, the
error class will be mapped correctly across the grpc boundary and
everything will be good. If you don't use those definitions, the error
maps to "unknown" and the error message is preserved.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
A few days ago, we added validation for namespaces. We've decided to
expand these naming rules to include containers. To facilitate this, a
common package `identifiers` now provides a common validation area.
These rules will be extended to apply to task identifiers, snapshot keys
and other areas where user-provided identifiers may be used.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
These interfaces allow us to preserve both the checking of error "cause"
as well as messages returned from the gRPC API so that the client gets
full error reason instead of a default "metadata: not found" in the case
of a missing image.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
To support multi-tenancy, containerd allows the collection of metadata
and runtime objects within a heirarchical storage primitive known as
namespaces. Data cannot be shared across these namespaces, unless
allowed by the service. This allows multiple sets of containers to
managed without interaction between the clients that management. This
means that different users, such as SwarmKit, K8s, Docker and others can
use containerd without coordination. Through labels, one may use
namespaces as a tool for cleanly organizing the use of containerd
containers, including the metadata storage for higher level features,
such as ACLs.
Namespaces
Namespaces cross-cut all containerd operations and are communicated via
context, either within the Go context or via GRPC headers. As a general
rule, no features are tied to namespace, other than organization. This
will be maintained into the future. They are created as a side-effect of
operating on them or may be created manually. Namespaces can be labeled
for organization. They cannot be deleted unless the namespace is empty,
although we may want to make it so one can clean up the entirety of
containerd by deleting a namespace.
Most users will interface with namespaces by setting in the
context or via the `CONTAINERD_NAMESPACE` environment variable, but the
experience is mostly left to the client. For `ctr` and `dist`, we have
defined a "default" namespace that will be created up on use, but there
is nothing special about it. As part of this PR we have plumbed this
behavior through all commands, cleaning up context management along the
way.
Namespaces in Action
Namespaces can be managed with the `ctr namespaces` subcommand. They
can be created, labeled and destroyed.
A few commands can demonstrate the power of namespaces for use with
images. First, lets create a namespace:
```
$ ctr namespaces create foo mylabel=bar
$ ctr namespaces ls
NAME LABELS
foo mylabel=bar
```
We can see that we have a namespace `foo` and it has a label. Let's pull
an image:
```
$ dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s total: 0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```
Now, let's list the image:
```
$ dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
That looks normal. Let's list the images for the `foo` namespace and see
this in action:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```
Look at that! Nothing was pulled in the namespace `foo`. Let's do the
same pull:
```
$ CONTAINERD_NAMESPACE=foo dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.8 s total: 0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```
Wow, that was very snappy! Looks like we pulled that image into out
namespace but didn't have to download any new data because we are
sharing storage. Let's take a peak at the images we have in `foo`:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
Now, let's remove that image from `foo`:
```
$ CONTAINERD_NAMESPACE=foo dist images rm
docker.io/library/redis:latest
```
Looks like it is gone:
```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```
But, as we can see, it is present in the `default` namespace:
```
$ dist images ls
REF TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```
What happened here? We can tell by listing the namespaces to get a
better understanding:
```
$ ctr namespaces ls
NAME LABELS
default
foo mylabel=bar
```
From the above, we can see that the `default` namespace was created with
the standard commands without the environment variable set. Isolating
the set of shared images while sharing the data that matters.
Since we removed the images for namespace `foo`, we can remove it now:
```
$ ctr namespaces rm foo
foo
```
However, when we try to remove the `default` namespace, we get an error:
```
$ ctr namespaces rm default
ctr: unable to delete default: rpc error: code = FailedPrecondition desc = namespace default must be empty
```
This is because we require that namespaces be empty when removed.
Caveats
- While most metadata objects are namespaced, containers and tasks may
exhibit some issues. We still need to move runtimes to namespaces and
the container metadata storage may not be fully worked out.
- Still need to migrate content store to metadata storage and namespace
the content store such that some data storage (ie images).
- Specifics of snapshot driver's relation to namespace needs to be
worked out in detail.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
In `execution.Create()` I was seeing `opts.Spec` unexpectedly becoming a slice
full of nulls instead of the expected data (often this occurred at the
`s.mu.Lock()`)
https://github.com/boltdb/bolt#caveats--limitations says:
> Byte slices returned from Bolt are only valid during a transaction. Once the
> transaction has been committed or rolled back then the memory they point to
> can be reused by a new page or can be unmapped from virtual memory and you'll
> see an unexpected fault address panic when accessing it.
Since `opts.Spec` = `container.Spec` where the latter is a byte slice returned
from Bolt we must copy it. The best place to do this is when reading, so that
callers need not worry about this.
I also checked metadata/*.go for similar issues and found no others.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
The implementations for the storage of metadata have been merged into a
single metadata package where they can share storage primitives and
techniques. The is a requisite for the addition of namespaces, which
will require a coordinated layout for records to be organized by
namespace.
Signed-off-by: Stephen J Day <stephen.day@docker.com>