The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.
The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.
And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.
Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.
The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.
Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
`attempt*time.Second`.
In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches `maxRequeue`
(which is currently 5).
This change moves the timeout to the publisher so each send attempt gets
its own timeout.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.
Fix#4326
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
TestRuntimeWithEmptyMaxEnvProcs should restore the GoMaxProcs after
test so that the temporary change of GoMaxProcs will not impact other
case, like TestRuntimeWithNonEmptyMaxEnvProcs.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
For some reason, shimv2 process doesn't exist. The ttrpc doesn't detect
the connection closed by server until delete task. For this case, we
should ignore the ttrpc.ErrClosed and let task manager handle the
cleanup.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Request came from a slack message that shims do not output their versions making
it hard for users and operators to know what version of a shim they have on the
system. This adds a `-v` flag to the shims so that users can see if a shim is
in sync with containerd or what versions of shims that they are running.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The Err() method should be called after the Scan() loop, not inside it.
Found by: git grep -A3 -F '.Scan()'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own
Closes#3471.
Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>
The background context aovids shim blocking when the ctx is cancelled
unexpectedly during shim start. But if the shim exits unexpectedly
before opening the pipe, the fd will never be closed.
`onCloseWithShimLog` makes sure that the shim log fd is closed properly
once the shim disconnects.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
There's no OOM monitoring for the v2 cgroups yet, so it seems unlikely
that there was a leak in that case.
Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
Only start watching the cgroup for OOMs when the first process starts
instead of on every process.
Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
If the context is cancelled during `shim.Create()`, such as the client
disconnects unexpectedly. The created shim will never be deleted.
What's more, if the context is cancelled during `openShimLog()`, the
fifo will be closed and block the shim output.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
Previously, the platform was closed as part of the Delete method when the
process was an init for a task and there were no more tasks after its deletion.
This can create problems if another task is created within the shim right after
the delete runs, which results in the platform being closed but the shim
continuing to run.
This change moves closing the platform to the Shutdown method after the shim's
context is canceled, which ensures the platform is only closed once the shim
is sure its done servicing containers.
Signed-off-by: Erik Sipsma <sipsma@amazon.com>
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Reized the I/O buffers to align with the size of the kernel buffers with fifos
and move the close aspect of the console to key off of the stdin closing.
Fixes#3738
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
- Our out of tree shim would like to publish events with ttrpc. These
functions should be exposed so our shim doesn't need to reimplement
publisher logic.
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
Because of the way go handles flags, passing a flag that is not defined
will cause an error. In our case, if we kept this as a flag, then
third-party shims would break when they see this new flag. To fix this,
I moved this new configuration option to an env var. We should use env
vars from here on out to avoid breaking shim compat.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.
As part of this change, a new configuration section is added for TTRPC
listener options.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>