namespaces: support within containerd

To support multi-tenancy, containerd allows the collection of metadata
and runtime objects within a heirarchical storage primitive known as
namespaces. Data cannot be shared across these namespaces, unless
allowed by the service. This allows multiple sets of containers to
managed without interaction between the clients that management. This
means that different users, such as SwarmKit, K8s, Docker and others can
use containerd without coordination. Through labels, one may use
namespaces as a tool for cleanly organizing the use of containerd
containers, including the metadata storage for higher level features,
such as ACLs.

Namespaces

Namespaces cross-cut all containerd operations and are communicated via
context, either within the Go context or via GRPC headers. As a general
rule, no features are tied to namespace, other than organization. This
will be maintained into the future. They are created as a side-effect of
operating on them or may be created manually. Namespaces can be labeled
for organization. They cannot be deleted unless the namespace is empty,
although we may want to make it so one can clean up the entirety of
containerd by deleting a namespace.

Most users will interface with namespaces by setting in the
context or via the `CONTAINERD_NAMESPACE` environment variable, but the
experience is mostly left to the client. For `ctr` and `dist`, we have
defined a "default" namespace that will be created up on use, but there
is nothing special about it. As part of this PR we have plumbed this
behavior through all commands, cleaning up context management along the
way.

Namespaces in Action

Namespaces can be managed with the `ctr namespaces` subcommand. They
can be created, labeled and destroyed.

A few commands can demonstrate the power of namespaces for use with
images. First, lets create a namespace:

```
$ ctr namespaces create foo mylabel=bar
$ ctr namespaces ls
NAME LABELS
foo  mylabel=bar
```

We can see that we have a namespace `foo` and it has a label. Let's pull
an image:

```
$ dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s total:   0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```

Now, let's list the image:

```
$ dist images ls
REF                            TYPE  DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```

That looks normal. Let's list the images for the `foo` namespace and see
this in action:

```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```

Look at that! Nothing was pulled in the namespace `foo`. Let's do the
same pull:

```
$ CONTAINERD_NAMESPACE=foo dist pull docker.io/library/redis:latest
docker.io/library/redis:latest: resolved       |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d45bc46b48e45e8c72c41aedd2a173bcc7f1ea4084a8fcfc5251b1da2a09c0b6: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:a858478874d144f6bfc03ae2d4598e2942fc9994159f2872e39fae88d45bd847: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4cdd94354d2a873333a205a02dbb853dd763c73600e0cf64f60b4bd7ab694875: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c54584150374aa94b9f7c3fbd743adcff5adead7a3cf7207b0e51551ac4a5517: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:71c1f30d820f0457df186531dc4478967d075ba449bd3168a3e82137a47daf03: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:d1f9221193a65eaf1b0afc4f1d4fbb7f0f209369d2696e1c07671668e150ed2b: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:10a267c67f423630f3afe5e04bbbc93d578861ddcc54283526222f3ad5e895b9: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:5b690bc4eaa6434456ceaccf9b3e42229bd2691869ba439e515b28fe1a66c009: done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.8 s total:   0.0 B (0.0 B/s)
INFO[0000] unpacking rootfs
INFO[0000] Unpacked chain id: sha256:41719840acf0f89e761f4a97c6074b6e2c6c25e3830fcb39301496b5d36f9b51
```

Wow, that was very snappy! Looks like we pulled that image into out
namespace but didn't have to download any new data because we are
sharing storage. Let's take a peak at the images we have in `foo`:

```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF                            TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```

Now, let's remove that image from `foo`:

```
$ CONTAINERD_NAMESPACE=foo dist images rm
docker.io/library/redis:latest
```

Looks like it is gone:

```
$ CONTAINERD_NAMESPACE=foo dist images ls
REF TYPE DIGEST SIZE
```

But, as we can see, it is present in the `default` namespace:

```
$ dist images ls
REF                            TYPE DIGEST SIZE
docker.io/library/redis:latest application/vnd.docker.distribution.manifest.v2+json sha256:548a75066f3f280eb017a6ccda34c561ccf4f25459ef8e36d6ea582b6af1decf 72.7 MiB
```

What happened here? We can tell by listing the namespaces to get a
better understanding:

```
$ ctr namespaces ls
NAME    LABELS
default
foo     mylabel=bar
```

From the above, we can see that the `default` namespace was created with
the standard commands without the environment variable set. Isolating
the set of shared images while sharing the data that matters.

Since we removed the images for namespace `foo`, we can remove it now:

```
$ ctr namespaces rm foo
foo
```

However, when we try to remove the `default` namespace, we get an error:

```
$ ctr namespaces rm default
ctr: unable to delete default: rpc error: code = FailedPrecondition desc = namespace default must be empty
```

This is because we require that namespaces be empty when removed.

Caveats

- While most metadata objects are namespaced, containers and tasks may
exhibit some issues. We still need to move runtimes to namespaces and
the container metadata storage may not be fully worked out.
- Still need to migrate content store to metadata storage and namespace
the content store such that some data storage (ie images).
- Specifics of snapshot driver's relation to namespace needs to be
worked out in detail.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
This commit is contained in:
Stephen J Day
2017-06-05 17:45:50 -07:00
parent 25cc7614ae
commit af2718b01f
52 changed files with 1184 additions and 223 deletions

View File

@@ -2,13 +2,36 @@ package metadata
import (
"github.com/boltdb/bolt"
"github.com/containerd/containerd/log"
)
// The layout where a "/" delineates a bucket is desribed in the following
// section. Please try to follow this as closely as possible when adding
// functionality. We can bolster this with helpers and more structure if that
// becomes an issue.
//
// Generically, we try to do the following:
//
// <version>/<namespace>/<object>/<key> -> <field>
//
// version: Currently, this is "v1". Additions can be made to v1 in a backwards
// compatible way. If the layout changes, a new version must be made, along
// with a migration.
//
// namespace: the namespace to which this object belongs.
//
// object: defines which object set is stored in the bucket. There are two
// special objects, "labels" and "indexes". The "labels" bucket stores the
// labels for the parent namespace. The "indexes" object is reserved for
// indexing objects, if we require in the future.
//
// key: object-specific key identifying the storage bucket for the objects
// contents.
var (
bucketKeyStorageVersion = []byte("v1")
bucketKeyImages = []byte("images")
bucketKeyContainers = []byte("containers")
bucketKeyVersion = []byte("v1")
bucketKeyObjectLabels = []byte("labels") // stores the labels for a namespace.
bucketKeyObjectIndexes = []byte("indexes") // reserved
bucketKeyObjectImages = []byte("images") // stores image objects
bucketKeyObjectContainers = []byte("containers") // stores container objects
bucketKeyDigest = []byte("digest")
bucketKeyMediaType = []byte("mediatype")
@@ -22,21 +45,6 @@ var (
bucketKeyUpdatedAt = []byte("updatedat")
)
// InitDB will initialize the database for use. The database must be opened for
// write and the caller must not be holding an open transaction.
func InitDB(db *bolt.DB) error {
log.L.Debug("init db")
return db.Update(func(tx *bolt.Tx) error {
if _, err := createBucketIfNotExists(tx, bucketKeyStorageVersion, bucketKeyImages); err != nil {
return err
}
if _, err := createBucketIfNotExists(tx, bucketKeyStorageVersion, bucketKeyContainers); err != nil {
return err
}
return nil
})
}
func getBucket(tx *bolt.Tx, keys ...[]byte) *bolt.Bucket {
bkt := tx.Bucket(keys[0])
@@ -66,44 +74,52 @@ func createBucketIfNotExists(tx *bolt.Tx, keys ...[]byte) (*bolt.Bucket, error)
return bkt, nil
}
func withImagesBucket(tx *bolt.Tx, fn func(bkt *bolt.Bucket) error) error {
bkt := getImagesBucket(tx)
if bkt == nil {
return ErrNotFound
func namespaceLabelsBucketPath(namespace string) [][]byte {
return [][]byte{bucketKeyVersion, []byte(namespace), bucketKeyObjectLabels}
}
func withNamespacesLabelsBucket(tx *bolt.Tx, namespace string, fn func(bkt *bolt.Bucket) error) error {
bkt, err := createBucketIfNotExists(tx, namespaceLabelsBucketPath(namespace)...)
if err != nil {
return err
}
return fn(bkt)
}
func withImageBucket(tx *bolt.Tx, name string, fn func(bkt *bolt.Bucket) error) error {
bkt := getImageBucket(tx, name)
if bkt == nil {
return ErrNotFound
func getNamespaceLabelsBucket(tx *bolt.Tx, namespace string) *bolt.Bucket {
return getBucket(tx, namespaceLabelsBucketPath(namespace)...)
}
func imagesBucketPath(namespace string) [][]byte {
return [][]byte{bucketKeyVersion, []byte(namespace), bucketKeyObjectImages}
}
func withImagesBucket(tx *bolt.Tx, namespace string, fn func(bkt *bolt.Bucket) error) error {
bkt, err := createBucketIfNotExists(tx, imagesBucketPath(namespace)...)
if err != nil {
return err
}
return fn(bkt)
}
func getImagesBucket(tx *bolt.Tx) *bolt.Bucket {
return getBucket(tx, bucketKeyStorageVersion, bucketKeyImages)
}
func getImageBucket(tx *bolt.Tx, name string) *bolt.Bucket {
return getBucket(tx, bucketKeyStorageVersion, bucketKeyImages, []byte(name))
func getImagesBucket(tx *bolt.Tx, namespace string) *bolt.Bucket {
return getBucket(tx, imagesBucketPath(namespace)...)
}
func createContainersBucket(tx *bolt.Tx) (*bolt.Bucket, error) {
bkt, err := tx.CreateBucketIfNotExists(bucketKeyStorageVersion)
bkt, err := createBucketIfNotExists(tx, bucketKeyVersion, bucketKeyObjectContainers)
if err != nil {
return nil, err
}
return bkt.CreateBucketIfNotExists(bucketKeyContainers)
return bkt, nil
}
func getContainersBucket(tx *bolt.Tx) *bolt.Bucket {
return getBucket(tx, bucketKeyStorageVersion, bucketKeyContainers)
return getBucket(tx, bucketKeyVersion, bucketKeyObjectContainers)
}
func getContainerBucket(tx *bolt.Tx, id string) *bolt.Bucket {
return getBucket(tx, bucketKeyStorageVersion, bucketKeyContainers, []byte(id))
return getBucket(tx, bucketKeyVersion, bucketKeyObjectContainers, []byte(id))
}

View File

@@ -35,13 +35,13 @@ func (s *containerStore) Get(ctx context.Context, id string) (containers.Contain
func (s *containerStore) List(ctx context.Context, filter string) ([]containers.Container, error) {
var (
m = []containers.Container{}
m []containers.Container
bkt = getContainersBucket(s.tx)
)
if bkt == nil {
return m, nil
}
err := bkt.ForEach(func(k, v []byte) error {
if err := bkt.ForEach(func(k, v []byte) error {
cbkt := bkt.Bucket(k)
if cbkt == nil {
return nil
@@ -53,8 +53,7 @@ func (s *containerStore) List(ctx context.Context, filter string) ([]containers.
}
m = append(m, container)
return nil
})
if err != nil {
}); err != nil {
return nil, err
}

View File

@@ -5,6 +5,7 @@ import "github.com/pkg/errors"
var (
ErrExists = errors.New("metadata: exists")
ErrNotFound = errors.New("metadata: not found")
ErrNotEmpty = errors.New("metadata: namespace not empty")
)
// IsNotFound returns true if the error is due to a missing image.
@@ -15,3 +16,7 @@ func IsNotFound(err error) bool {
func IsExists(err error) bool {
return errors.Cause(err) == ErrExists
}
func IsNotEmpty(err error) bool {
return errors.Cause(err) == ErrNotEmpty
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/boltdb/bolt"
"github.com/containerd/containerd/images"
"github.com/containerd/containerd/namespaces"
digest "github.com/opencontainers/go-digest"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
@@ -21,10 +22,24 @@ func NewImageStore(tx *bolt.Tx) images.Store {
func (s *imageStore) Get(ctx context.Context, name string) (images.Image, error) {
var image images.Image
if err := withImageBucket(s.tx, name, func(bkt *bolt.Bucket) error {
image.Name = name
return readImage(&image, bkt)
}); err != nil {
namespace, err := namespaces.NamespaceRequired(ctx)
if err != nil {
return images.Image{}, err
}
bkt := getImagesBucket(s.tx, namespace)
if bkt == nil {
return images.Image{}, ErrNotFound
}
ibkt := bkt.Bucket([]byte(name))
if ibkt == nil {
return images.Image{}, ErrNotFound
}
image.Name = name
if err := readImage(&image, ibkt); err != nil {
return images.Image{}, err
}
@@ -32,7 +47,12 @@ func (s *imageStore) Get(ctx context.Context, name string) (images.Image, error)
}
func (s *imageStore) Put(ctx context.Context, name string, desc ocispec.Descriptor) error {
return withImagesBucket(s.tx, func(bkt *bolt.Bucket) error {
namespace, err := namespaces.NamespaceRequired(ctx)
if err != nil {
return err
}
return withImagesBucket(s.tx, namespace, func(bkt *bolt.Bucket) error {
ibkt, err := bkt.CreateBucketIfNotExists([]byte(name))
if err != nil {
return err
@@ -64,23 +84,30 @@ func (s *imageStore) Put(ctx context.Context, name string, desc ocispec.Descript
func (s *imageStore) List(ctx context.Context) ([]images.Image, error) {
var m []images.Image
namespace, err := namespaces.NamespaceRequired(ctx)
if err != nil {
return nil, err
}
if err := withImagesBucket(s.tx, func(bkt *bolt.Bucket) error {
return bkt.ForEach(func(k, v []byte) error {
var (
image = images.Image{
Name: string(k),
}
kbkt = bkt.Bucket(k)
)
bkt := getImagesBucket(s.tx, namespace)
if bkt == nil {
return nil, nil // empty store
}
if err := readImage(&image, kbkt); err != nil {
return err
if err := bkt.ForEach(func(k, v []byte) error {
var (
image = images.Image{
Name: string(k),
}
kbkt = bkt.Bucket(k)
)
m = append(m, image)
return nil
})
if err := readImage(&image, kbkt); err != nil {
return err
}
m = append(m, image)
return nil
}); err != nil {
return nil, err
}
@@ -89,7 +116,12 @@ func (s *imageStore) List(ctx context.Context) ([]images.Image, error) {
}
func (s *imageStore) Delete(ctx context.Context, name string) error {
return withImagesBucket(s.tx, func(bkt *bolt.Bucket) error {
namespace, err := namespaces.NamespaceRequired(ctx)
if err != nil {
return err
}
return withImagesBucket(s.tx, namespace, func(bkt *bolt.Bucket) error {
err := bkt.DeleteBucket([]byte(name))
if err == bolt.ErrBucketNotFound {
return ErrNotFound

145
metadata/namespaces.go Normal file
View File

@@ -0,0 +1,145 @@
package metadata
import (
"context"
"github.com/boltdb/bolt"
"github.com/containerd/containerd/namespaces"
)
type namespaceStore struct {
tx *bolt.Tx
}
func NewNamespaceStore(tx *bolt.Tx) namespaces.Store {
return &namespaceStore{tx: tx}
}
func (s *namespaceStore) Create(ctx context.Context, namespace string, labels map[string]string) error {
topbkt, err := createBucketIfNotExists(s.tx, bucketKeyVersion)
if err != nil {
return err
}
// provides the already exists error.
bkt, err := topbkt.CreateBucket([]byte(namespace))
if err != nil {
if err == bolt.ErrBucketExists {
return ErrExists
}
return err
}
lbkt, err := bkt.CreateBucketIfNotExists(bucketKeyObjectLabels)
if err != nil {
return err
}
for k, v := range labels {
if err := lbkt.Put([]byte(k), []byte(v)); err != nil {
return err
}
}
return nil
}
func (s *namespaceStore) Labels(ctx context.Context, namespace string) (map[string]string, error) {
labels := map[string]string{}
bkt := getNamespaceLabelsBucket(s.tx, namespace)
if bkt == nil {
return labels, nil
}
if err := bkt.ForEach(func(k, v []byte) error {
labels[string(k)] = string(v)
return nil
}); err != nil {
return nil, err
}
return labels, nil
}
func (s *namespaceStore) SetLabel(ctx context.Context, namespace, key, value string) error {
return withNamespacesLabelsBucket(s.tx, namespace, func(bkt *bolt.Bucket) error {
if value == "" {
return bkt.Delete([]byte(key))
}
return bkt.Put([]byte(key), []byte(value))
})
}
func (s *namespaceStore) List(ctx context.Context) ([]string, error) {
bkt := getBucket(s.tx, bucketKeyVersion)
if bkt == nil {
return nil, nil // no namespaces!
}
var namespaces []string
if err := bkt.ForEach(func(k, v []byte) error {
if v != nil {
return nil // not a bucket
}
namespaces = append(namespaces, string(k))
return nil
}); err != nil {
return nil, err
}
return namespaces, nil
}
func (s *namespaceStore) Delete(ctx context.Context, namespace string) error {
bkt := getBucket(s.tx, bucketKeyVersion)
if empty, err := s.namespaceEmpty(ctx, namespace); err != nil {
return err
} else if !empty {
return ErrNotEmpty
}
if err := bkt.DeleteBucket([]byte(namespace)); err != nil {
if err == bolt.ErrBucketNotFound {
return ErrNotFound
}
return err
}
return nil
}
func (s *namespaceStore) namespaceEmpty(ctx context.Context, namespace string) (bool, error) {
ctx = namespaces.WithNamespace(ctx, namespace)
// need to check the various object stores.
imageStore := NewImageStore(s.tx)
images, err := imageStore.List(ctx)
if err != nil {
return false, err
}
if len(images) > 0 {
return false, nil
}
containerStore := NewContainerStore(s.tx)
containers, err := containerStore.List(ctx, "")
if err != nil {
return false, err
}
if len(containers) > 0 {
return false, nil
}
// TODO(stevvooe): Need to add check for content store, as well. Still need
// to make content store namespace aware.
return true, nil
}