This greatly reduce the risk that we will hit the unix socket maximum path
length.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
This splits up the create and start of an exec process in the shim to
have two separate steps like the initial process. This will allow
better state reporting for individual process along with a more robust
wait for execs.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This change further plumbs the components required for implementing
event filters. Specifically, we now have the ability to filter on the
`topic` and `namespace`.
In the course of implementing this functionality, it was found that
there were mismatches in the events API that created extra serialization
round trips. A modification to `typeurl.MarshalAny` and a clear
separation between publishing and forwarding allow us to avoid these
serialization issues.
Unfortunately, this has required a few tweaks to the GRPC API, so this
is a breaking change. `Publish` and `Forward` have been clearly separated in
the GRPC API. `Publish` honors the contextual namespace and performs
timestamping while `Forward` simply validates and forwards. The behavior
of `Subscribe` is to propagate events for all namespaces unless
specifically filtered (and hence the relation to this particular change.
The following is an example of using filters to monitor the task events
generated while running the [bucketbench tool](https://github.com/estesp/bucketbench):
```
$ ctr events 'topic~=/tasks/.+,namespace==bb'
...
2017-07-28 22:19:51.78944874 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-6-8","pid":25889}
2017-07-28 22:19:51.791893688 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-4-8","pid":25882}
2017-07-28 22:19:51.792608389 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-2-9","pid":25860}
2017-07-28 22:19:51.793035217 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-5-6","pid":25869}
2017-07-28 22:19:51.802659622 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-0-7","pid":25877}
2017-07-28 22:19:51.805192898 +0000 UTC bb /tasks/start {"container_id":"bb-ctr-3-6","pid":25856}
2017-07-28 22:19:51.832374931 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-8-6","id":"bb-ctr-8-6","pid":25864,"exited_at":"2017-07-28T22:19:51.832013043Z"}
2017-07-28 22:19:51.84001249 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-2-9","id":"bb-ctr-2-9","pid":25860,"exited_at":"2017-07-28T22:19:51.839717714Z"}
2017-07-28 22:19:51.840272635 +0000 UTC bb /tasks/exit {"container_id":"bb-ctr-7-6","id":"bb-ctr-7-6","pid":25855,"exited_at":"2017-07-28T22:19:51.839796335Z"}
...
```
In addition to the events changes, we now display the namespace origin
of the event in the cli tool.
This will be followed by a PR to add individual field filtering for the
events API for each event type.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
this adds a `platform` interface for shim service to manage platform-specific
behaviors such as I/O (which uses epoll in linux to work around bugs with applications
that closes all consoles i.e. https://github.com/opencontainers/runc/pull/1434
and https://github.com/moby/moby/issues/27202)
Its expected that we only have 1 epollfd per containerd_shim to manage all processes.
Since all the work are done outside of the container runtime, upgrading of runc
is not required and should be done separately.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
In the course of setting out to add filters and address some cleanup, it
was found that we had a few problems in the events subsystem that needed
addressing before moving forward.
The biggest change was to move to the more standard terminology of
publish and subscribe. We make this terminology change across the Go
interface and the GRPC API, making the behavior more familier. The
previous system was very context-oriented, which is no longer required.
With this, we've removed a large amount of dead and unneeded code. Event
transactions, context storage and the concept of `Poster` is gone. This
has been replaced in most places with a `Publisher`, which matches the
actual usage throughout the codebase, removing the need for helpers.
There are still some questions around the way events are handled in the
shim. Right now, we've preserved some of the existing bugs which may
require more extensive changes to resolve correctly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This avoids someone adding a new error path and forgetting to call the cleanup
function.
We prefer to use an explicit flag to gate the clean rather than relying on `err
!= nil` so we don't have to rely on people never accidentally shadowing the
`err` as seen by the closure.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Mounting as MS_SLAVE here breaks use cases which want to use
rootPropagation=shared in order to expose mounts to the host (and other
containers binding the same subtree), mounting as e.g. MS_SHARED is pointless
in this context so just remove.
Having done this we also need to arrange to manually clean up the mounts on
delete, so do so.
Note that runc will also setup root as required by rootPropagation, defaulting
to MS_PRIVATE.
Fixes#1132.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Since we now have a common set of error definitions, mapped to existing
error codes, we no longer need the specialized error codes used for
interaction with linux processes. The main issue was that string
matching was being used to map these to useful error codes. With this
change, we use errors defined in the `errdefs` package, which map
cleanly to GRPC error codes and are recoverable on either side of the
request.
The main focus of this PR was in removin these from the shim. We may
need follow ups to ensure error codes are preserved by the `Tasks`
service.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
After review, there are cases where having common requirements for
namespaces and identifiers creates contention between applications. One
example is that it is nice to have namespaces comply with domain name
requirement, but that does not allow underscores, which are required for
certain identifiers.
The namespaces validation has been reverted to be in line with RFC 1035.
Existing identifiers has been modified to allow simply alpha-numeric
identifiers, while limiting adjacent separators.
We may follow up tweaks for the identifier charset but this split should
remove the hard decisions.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This removes the RuntimeEvent super proto with enums into separate
runtime event protos to be inline with the other events that are output
by containerd.
This also renames the runtime events into Task* events.
Fixes#1071
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This makes it possible to enable shim debug by adding the following to
`config.toml`:
[plugins.linux]
shim_debug = true
I moved the debug setting from the `client.Config struct` to an argument to
`client.WithStart` since this is the only place it would be used.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
The compiler doesn't spot this, but guru does.
This seems to have become unused in 79e6a93624 ("Fix incorrect reference to
the gRPC runtime name as a binary").
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
We hope that containerd supports any OCI compliant runtime, and not only
runc.
This patch fixes all the error messages to not be completely runc
specific and change the initProcess structure to have its runtime
pointer be called 'runtime' and not 'runc'
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
These rpcs only return pids []uint32 so should be named that way in
order to have other rpcs that list Processes such as Exec'd processes.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Until we have a way to preserve the initial command used to start the
container, we have to default to the default `runc` found on the $PATH.
This code after the last refactor of shim/API is incorrectly using the
gRPC object reference of the v1 runtime as a binary name which causes
os.Exec() errors.
Signed-off-by: Phil Estes <estesp@gmail.com>
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
This moves the shim's API and protos out of the containerd services
package and into the linux runtime package. This is because the shim is
an implementation detail of the linux runtime that we have and it is not
a containerd user facing api.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
To simplify use of types, we have consolidate the packages for the mount
and descriptor protobuf types into a single Go package. We also drop the
versioning from the type packages, as these types will remain the same
between versions.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
When using events, it was found to be fairly unwieldy with a number of
extra packages. For the most part, when interacting with the events
service, we want types of the same version of the service. This has been
accomplished by moving all events types into the events package.
In addition, several fixes to the way events are marshaled have been
included. Specifically, we defer to the protobuf type registration
system to assemble events and type urls, with a little bit sheen on top
of add a containerd.io oriented namespace.
This has resulted in much cleaner event consumption and has removed the
reliance on error prone type urls, in favor of concrete types.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: update events package to include emitter and use envelope proto
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: add events service
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: enable events service and update ctr events to use events service
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
event listeners
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: helper func for emitting in services
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: improved cli for containers and tasks
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
create event envelope with poster
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: introspect event data to use for type url
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: use pb encoding; add event types
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument content and snapshot services with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument image service with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: instrument namespace service with events
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: add namespace support
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: only send events from namespace requested from client
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
events: switch to go-events for broadcasting
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
We need a separate API for handing the exit status and deletion of
Exec'd processes to make sure they are properly cleaned up within the
shim and daemon.
Fixes#973
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows attach of existing fifos to be done without any information
stored on the client side.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This also fixes a deadlock in the shim's reaper where execs would lockup
and/or miss a quick exiting exec process's exit status.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
There are three aspects which need to be covered:
- the runtime needs to restart its event pump when it reconnects (in
loadContainer).
- on the server side shim needs to monitor the stream context so it knows when
the connection goes away.
- if the shim's stream.Send() fails (because the stream died between taking
the event off the channel and calling stream.Send()) then to avoid losing
that event the shim should remember it and send it out first on the next
stream.
The shim's event production machinery only handles producing a single event
stream, so add an interlock to ensure there is only one reader of the
`s.events` channel at a time. Subsequent attempts to use Events will block
until the existing owner is done.
Fixes#921.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Working from feedback on the existing implementation, we have now
introduced a central metadata object to represent the lifecycle and pin
the resources required to implement what people today know as
containers. This includes the runtime specification and the root
filesystem snapshots. We also allow arbitrary labeling of the container.
Such provisions will bring the containerd definition of container closer
to what is expected by users.
The objects that encompass today's ContainerService, centered around the
runtime, will be known as tasks. These tasks take on the existing
lifecycle behavior of containerd's containers, which means that they are
deleted when they exit. Largely, there are no other changes except for
naming.
The `Container` object will operate purely as a metadata object. No
runtime state will be held on `Container`. It only informs the execution
service on what is required for creating tasks and the resources in use
by that container. The resources referenced by that container will be
deleted when the container is deleted, if not in use. In this sense,
users can create, list, label and delete containers in a similar way as
they do with docker today, without the complexity of runtime locks that
plagues current implementations.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This moves both the Mount type and mountinfo into a single mount
package.
This also opens up the root of the repo to hold the containerd client
implementation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update go-runc to 49b2a02ec1ed3e4ae52d30b54a291b75
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add shim to restore creation
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Keep checkpoint path in service
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add C/R to non-shim build
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Checkpoint rw and image
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container on bind checkpoints
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Return dump.log in error on checkpoint failure
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Pause container for checkpoint
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update runc to 639454475cb9c8b861cc599f8bcd5c8c790ae402
For checkpoint into to work you need runc version
639454475cb9c8b861cc599f8bcd5c8c790ae402 + and criu 3.0 as this is what
I have been testing with.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Move restore behind create calls
This remove the restore RPCs in favor of providing the checkpoint
information to the `Create` calls of a container. If provided, the
container will be created/restored from the checkpoint instead of an
existing container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Regen protos after rebase
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds support for signalling a container process by pid.
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
make Ps more extensible
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
ps: windows support
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>