Working from feedback on the existing implementation, we have now
introduced a central metadata object to represent the lifecycle and pin
the resources required to implement what people today know as
containers. This includes the runtime specification and the root
filesystem snapshots. We also allow arbitrary labeling of the container.
Such provisions will bring the containerd definition of container closer
to what is expected by users.
The objects that encompass today's ContainerService, centered around the
runtime, will be known as tasks. These tasks take on the existing
lifecycle behavior of containerd's containers, which means that they are
deleted when they exit. Largely, there are no other changes except for
naming.
The `Container` object will operate purely as a metadata object. No
runtime state will be held on `Container`. It only informs the execution
service on what is required for creating tasks and the resources in use
by that container. The resources referenced by that container will be
deleted when the container is deleted, if not in use. In this sense,
users can create, list, label and delete containers in a similar way as
they do with docker today, without the complexity of runtime locks that
plagues current implementations.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update go-runc to 49b2a02ec1ed3e4ae52d30b54a291b75
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add shim to restore creation
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Keep checkpoint path in service
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add C/R to non-shim build
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Checkpoint rw and image
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container on bind checkpoints
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Return dump.log in error on checkpoint failure
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container for checkpoint
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update runc to 639454475cb9c8b861cc599f8bcd5c8c790ae402
For checkpoint into to work you need runc version
639454475cb9c8b861cc599f8bcd5c8c790ae402 + and criu 3.0 as this is what
I have been testing with.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Move restore behind create calls
This remove the restore RPCs in favor of providing the checkpoint
information to the `Create` calls of a container. If provided, the
container will be created/restored from the checkpoint instead of an
existing container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Regen protos after rebase
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds support for signalling a container process by pid.
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
make Ps more extensible
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
ps: windows support
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Remove rootfs service in place of snapshot service. Adds
diff service for extracting and creating diffs. Diff
creation is not yet implemented. This service allows
pulling or creating images without needing root access to
mount. Additionally in the future this will allow containerd
to ensure extractions happen safely in a chroot if needed.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
The split between provider and ingester was a long standing division
reflecting the client-side use cases. For the most part, we were
differentiating these for the algorithms that operate them, but it made
instantation and use of the types challenging. On the server-side, this
distinction is generally less important. This change unifies these types
and in the process we get a few benefits.
The first is that we now completely access the content store over GRPC.
This was the initial intent and we have now satisfied this goal
completely. There are a few issues around listing content and getting
status, but we resolve these with simple streaming and regexp filters.
More can probably be done to polish this but the result is clean.
Several other content-oriented methods were polished in the process of
unification. We have now properly seperated out the `Abort` method to
cancel ongoing or stalled ingest processes. We have also replaced the
`Active` method with a single status method.
The transition went extremely smoothly. Once the clients were updated to
use the new methods, every thing worked as expected on the first
compile.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This adds pause and unpause to containerd's execution service and the
same commands to the `ctr` client.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This is a first pass at the metadata required for supporting an image
store. We use a shallow approach to the problem, allowing this
component to centralize the naming. Resources for this image can then be
"snowballed" in for actual implementations. This is better understood
through example.
Let's take pull. One could register the name "docker.io/stevvooe/foo" as
pointing at a particular digest. When instructed to pull or fetch, the
system will notice that no components of that image are present locally.
It can then recursively resolve the resources for that image and fetch
them into the content store. Next time the instruction is issued, the
content will be present so no action will be taken.
Another example is preparing the rootfs. The requirements for a rootfs
can be resolved from a name. These "diff ids" will then be compared with
what is available in the snapshot manager. Any parts of the rootfs, such
as a layer, that isn't available in the snapshotter can be unpacked.
Once this process is satisified, the image will be runnable as a
container.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The message was defined but the method was returning empty, plumb through the
result from the shim layer.
Compile tested only.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
To make restarting after failed pull less racy, we define `Truncate(size
int64) error` on `content.Writer` for the zero offset. Truncating a
writer will dump any existing data and digest state and start from the
beginning. All subsequent writes will start from the zero offset.
For the service, we support this by defining the behavior for a write
that changes the offset. To keep this narrow, we only support writes out
of order at the offset 0, which causes the writer to dump existing data
and reset the local hash.
This makes restarting failed pulls much smoother when there was a
previously encountered error and the source doesn't support arbitrary
seeks or reads at arbitrary offsets. By allowing this to be done while
holding the write lock on a ref, we can restart the full download
without causing a race condition.
Once we implement seeking on the `io.Reader` returned by the fetcher,
this will be less useful, but it is good to ensure that our protocol
properly supports this use case for when streaming is the only option.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Allow deletion of content over the GRPC interface. For now, we are going
with a model that conducts reference management outside of the content
store, in the metadata store but this design is valid either way.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
For clients which only want to know about one container this is simpler than
searching the result of execution.List.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
After implementing pull, a few changes are required to the content store
interface to make sure that the implementation works smoothly.
Specifically, we work to make sure the predeclaration path for digests
works the same between remote and local writers. Before, we were
hesitent to require the the size and digest up front, but it became
clear that having this provided significant benefit.
There are also several cleanups related to naming. We now call the
expected digest `Expected` consistently across the board and `Total` is
used to mark the expected size.
This whole effort comes together to provide a very smooth status
reporting workflow for image pull and push. This will be more obvious
when the bulk of pull code lands.
There are a few other changes to make `content.WriteBlob` more broadly
useful. In accordance with addition for predeclaring expected size when
getting a `Writer`, `WriteBlob` now supports this fully. It will also
resume downloads if provided an `io.Seeker` or `io.ReaderAt`. Coupled
with the `httpReadSeeker` from `docker/distribution`, we should only be
a lines of code away from resumable downloads.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Bring the content service into the containerd API. This allows the
content store to be coordinated in the containerd daemon with minimal
effort. For the most part, this API follows the conventions and behavior
of the existing content store implementation with a few caveats.
Specifically, we remove the object oriented transaction mechanism in
favor of a very rich `Write` call.
Pains are taken to reduce race conditions around when having multiple
writers to a single piece of content. Clients should be able to race
towards getting a write lock on a reference, then wait on each other.
For the most part, this should be generically pluggable to allow
implementations of the content store to be swapped out.
We'll follow this up with an implementation to validate the model.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Updates to the gogo/protobuf dependency are required to correctly
generate time types. We also remove an unused windows dependency.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This is not really a service like the other rpcs that we expose so lets
change the import paths for it.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
After trying to explain the complexities of developing with protobuf, I
have now created a command that correctly calculates the import paths
for each package and runs the protobuf command.
The Makefile has been updated accordingly, expect we now no longer use
`go generate`. A new target `protos` has been defined. We alias the two,
for the lazy. We leave `go generate` in place for cases where we will
actually use `go generate`.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The mount type is used across common GRPC services to express a deferred
access of a filesystem. Right now, they are generated by snapshotters,
but eventually, they can be passed for containers for creation at
runtime. With this flow, we can separate the generation and use of a
root container filesystem.
Signed-off-by: Stephen J Day <stephen.day@docker.com>