To avoid buffer bloat in long running processes, we try to use buffer
pools where possible. This is meant to address shim memory usage issues,
but may not be the root cause.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Because tasks may be deleted while listing containers, we need to ignore
errors from state requests that are due to a closed error. All of these
get mapped to ErrNotFound, which can be used to filter the entries.
There may be a better fix that does a better job of keeping track of the
intended state of a backend task. The current condition of assuming that
a closed client is a shutdown task may be too naive.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This allows other packages and plugins to easily exec things without
racing with the reaper.
The reaper is mostly needed in the shim but can be removed in containerd
in favor of the `exec.Cmd` apis
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Helps with #1935
This hold the shim lock during the state call to make sure that the task
does not get deleted during a state call.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Could issues where when exec processes fail the wait block is not
released.
Second, you could not dump stacks if the reaper loop locks up.
Third, the publisher was not waiting on the correct pid.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
By replacing grpc with ttrpc, we can reduce total memory runtime
requirements and binary size. With minimal code changes, the shim can
now be controlled by the much lightweight protocol, reducing the total
memory required per container.
When reviewing this change, take particular notice of the generated shim
code.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The binary name used for executing "containerd publish" was hard-coded
in the shim code, and hence it did not work with customized daemon
binary name. (e.g. `docker-containerd`)
This commit allows specifying custom daemon binary via `containerd-shim
-containerd-binary ...`.
The daemon invokes this command with `os.Executable()` path.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Because of a side-effect import, we have the possibility of pulling in
several unnecessary packages that are used by the plugin and not at
runtime to implement protobuf structures. Setting these imports to
`weak` prevents this from happening, reducing the total import set,
reducing memory usage and binary size.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
To avoid importing all of grpc when consuming events, the types of
events have been split in to a separate package. This should allow a
reduction in memory usage in cases where a package is consuming events
but not using the gprc service directly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
This is needed for users on kernel older than 3.18 so they can avoid EBUSY
errors when trying to unlink, rename or remove a mountpoint that is present in
a shim namespace.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
ref: #1464
This tries to solve issues with races around process state. First it
adds the process mutex around the state call so that any state changes,
deletions, etc will be handled in order.
Second, for IsNoExist errors from the runtime, return a stopped state if
a process has been removed from the underlying OCI runtime but not from
the shim yet. This shouldn't happen with the lock from above but its
hare to verify this issue.
Third, handle shim disconnections and return an ErrNotFound.
Forth, don't abort returning all tasks if one task is unable to return
its state.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This ensure that when using the host pid, we don't let process alive,
preventing Wait() to return until they all die.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
This also fix the type used for RuncOptions.SystemCgroup, hence introducing
an API break.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Depends on https://github.com/containerd/go-runc/pull/24
The is currently a race with the reaper where you could miss some exit
events from processes.
The problem before and why the reaper was so complex was because
processes could fork, getting a pid, and then fail on an execve before
we would have time to register the process with the reaper. This could
cause pids to fill up in a map as a way to reduce the race.
This changes makes the reaper handle multiple subscribers so that the
caller can handle locking, for when they want to wait for a specific
pid, without affecting other callers using the reaper code.
Exit events are broadcast to multiple subscribers, in the case, the runc
commands and container pids that we get from a pid-file. Locking while
the entire container stats no longs affects runc commands where you want
to call `runc create` and wait until that has been completed.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>