If containerd crashes while creating a container the shim process stays alive and is never
cleaned up. Details are discussed in issue containerd/containerd#6860. This fixes the code
to cleanup such shim processes on containerd restart.
Signed-off-by: Amit Barve <ambarve@microsoft.com>
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
addresses https://github.com/containerd/containerd/issues/6464
Return an error if a runtime provided is relative.
Add context to the usage for `ctr run --runtime` indicating that
absolute path to runtime binary must be provided.
Signed-off-by: Gavin Inglis <giinglis@amazon.com>
ShimV2 has shim.Delete command to cleanup task's temporary resource,
like bundle folder. Since the shim server exits and no persistent store
is for task's exit code, the result of shim.Delete is always 137 exit
code, like the task has been killed.
And the result of shim.Delete can be used as task event only when the
shim server is killed somehow after container is running. Therefore,
dockerd, which watches task exit event to update status of container,
can report correct status.
Back to the issue #6429, the container is not running because the
entrypoint is not found. Based on this design, we should not send
137 exitcode event to subscriber.
This commit is aimed to remove shim instance first and then the
`cleanupAfterDeadShim` should not send event.
Similar Issue: #4769Fix#6429
Signed-off-by: Wei Fu <fuweid89@gmail.com>
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.
The containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.
kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html
Signed-off-by: Michael Crosby <michael@thepasture.io>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.
The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.
And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
If the context is cancelled during `shim.Create()`, such as the client
disconnects unexpectedly. The created shim will never be deleted.
What's more, if the context is cancelled during `openShimLog()`, the
fifo will be closed and block the shim output.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.
As part of this change, a new configuration section is added for TTRPC
listener options.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
When shimv2 dead, the container would be cleanup, but
the corresponding runtime task still existed in runtime
task lists, it should be deleted too.
Signed-off-by: lifupan <lifupan@gmail.com>
skip hidden directories in load task, and return soon if path not exist
in atomicDelete
carry of #3233Closes#3233
Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The two new method Add/Delete can allow custom plugin to add or migrate
existing task into major Runtime plugin.
close: #2888
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This makes bundle removal atomic by first renaming the bundle and
working directories to a hidden path before removing the underlying
directories.
Closes#2567Closes#2327
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This cleans up persistent work dirs on TaskManager boot. These dirs can
be left behind in a machine reboot. The state in /run will not exist
but the work dir in the root does, we should cleanup work dirs when
tasks are not loaded.
This also improves error handling that would prevent the task manager
from loading when a single task fails to load or cleanup.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>