Bundle directory permissions should be 0700 by default. On Linux with
user namespaces enabled, the remapped root also needs access to the
bundle directory. In this case, the bundle directory is modified to
0710 and group ownership is changed to the remapped root group.
Signed-off-by: Samuel Karp <skarp@amazon.com>
In linux platform, the shim server always listens on the socket before
the containerd task manager dial it. It is unlikely that containerd task
manager should handle reconnect because the shim can't restart. For this
case, the containerd task manager should fail fast if there is ENOENT or
ECONNREFUSED error.
And if the socket file is deleted during cleanup the exited task, it
maybe cause that containerd task manager takes long time to reload the
dead shim. For that task.v2 manager, the race case is like:
```
TaskService.Delete
TaskManager.Delete(runtime/v2/manager.go)
shim.delete(runtime/v2/shim.go)
shimv2api.Shutdown(runtime/v2/task/shim.pb.go)
<- containerd has been killed or restarted somehow
bundle.Delete
```
The shimv2api.Shutdown will cause that the shim deletes socket file
(containerd-shim-runc-v2 does). But the bundle is still there. During
reloading, the containerd will wait for the socket file appears again
in 100 seconds. It is not reasonable. The Reconnect should prevent this
case by fast fail.
Closes: #5648.
Fixes: #5597.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Since the /run directory on macOS is read-only, darwin containerd should
use a different directory. Use the pre-defined default values instead
to avoid this issue.
Fixes: bd908acab ("Use path based unix socket for shims")
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.
This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.
From the related blogpost (ttps://blog.golang.org/path-security):
> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing
This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Refactor shim v2 to load and register plugins.
Update init shim interface to not require task service implementation on
returned service, but register as plugin if it is.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Remove build tags which are already implied by the name of the file.
Ensures build tags are used consistently
Signed-off-by: Derek McGowan <derek@mcg.dev>
For the abstract socket adress there's no need to chmod
the address's file, cause the file didn't exist actually.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
cgroupsv2.LoadManager() already performs VerifyGroupPath(), and returns
an error if the path is invalid, so this check is redundant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this change, for several of the services that `WithServices`
handles, only the grpc client is supported.
Now, for instance, one can use an `images.Store` directly instead of
only an `imagesapi.StoreSlient`.
Some of the methods have been renamed to satisfy the difference between
using a grpc `<Foo>Client` vs the main interface.
I did not see a good candidate for TaskService so have left that mostly
unchanged.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Currently the shimv2 debug is only enabled when containerd is,
specifically, on debug mode. However, it should be enabled whenever the
CRI runtime is on debug *or any other lower* mode, as in trace mode.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
fork/exec can fail and log a warning like this in containerd's log:
failed to clean up after shim disconnected error=": fork/exec /usr/local/bin/containerd-shim-[my-shim]: no such file or directory" id=test namespace=default
Passing the bundle path on the command line allows the shim delete
command to run successfully.
Signed-off-by: Samuel Karp <me@samuelkarp.com>
The shim.SetScore() utility was no longer used since 7dfc605fc6.
Checking for uses outside of this repository, I found only one external use of
this in gVisor; a9441aea27/pkg/shim/service.go (L262-L264)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After #4906, containerd opens fifo in read/write mode in linux platform
The original comment doesn't correct and is removed by #5174.
```
// original comment
// When using a multi-container shim, the fifo of the 2nd to Nth
// container will not be opened when the ctx is done. This will
// cause an ErrReadClosed that can be ignored.
```
However, we should add comment for checkCopyShimLogError to mention why
we call checkCopyShimLogError. The checkCopyShimLogError, it is to prevent
the flood of expected error messages after task die and the expected
errors depend on platform.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
- process.Init#io could be nil
- Make sure CreateTaskRequest#Options is not empty before unmarshaling
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
When runC shimv2 starts, the StartShim interface will re-exec itself as
long-running process, which will read the `address` during initializing.
```happycase
Process
containerd-shim-runc-v1/v2 start containerd-shim-runc-v1/v2
initializing socket
reexec containerd-shim-runc-v1/v2
write address into file
initializing
read address
write back to containerd daemon
serving
...
remove address in Shutdown call
```
However, there is no synchronization after reexec. Then the data race is
like:
```leaking-case
Process
containerd-shim-runc-v1/v2 start containerd-shim-runc-v1/v2
initializing socket
reexec containerd-shim-runc-v1/v2
initializing
read address
write address into file
write back to containerd daemon
serving
...
fail to remove address
because of empty address
```
The `address` should be writen into file first before reexec.
And if shutdown the whole service before cleanup temporary
resource (like socket file), the Shutdown caller will receive `ttrpc: closed`
sometime, which depends on go runtime scheduler. Then it also causes leaking
socket files.
Since the shimV2-Delete binary API must be called to cleanup shim temporary
resource and shimV2-runC-v1 doesn't support grouping multi containers in one,
it is safe to remove the socket file in the binary call for shimV2-runC-v1.
But for the shimV2-runC-v2 shim, we still cleanup socket in Shutdown.
Hopefully we can find a way to cleanup socket in shimV2-Delete binary
call.
Fix: #5173
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Previously a typo was introduced that caused the wrong error to be
checked against when calling exec.LookPath. This had the effect that
containerd would never locate the shim binary if it was in the same
directory as containerd's binary, but not in PATH.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
The only shared variable `m.client` is thread-safe, so we can safely
parallelize the loops.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The current code simply ignores the full binary path when starting the
shimv2 process, and instead fallbacks to a binary in the path, and this
is problematic (and confusing) for those using CRI-O, which has this
bits vendored.
The reason it's problematic with CRI-O is because the user can simply
set the full binary path and, instead of having that executed, CRI-O
will simply fail to create the container unless that binary is part of
the path, which may not be case in a few different scenarios (testing
being the most common one).
Fixes: #5006
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
bump version 1.3.2 for gogo/protobuf due to CVE-2021-3121 discovered
in gogo/protobuf version 1.3.1, CVE has been fixed in 1.3.2
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
When the shim returns a plain error when a process does not exist,
the server is unable to recognise its GRPC status code and assumes
UnknownError. This is awkward for containerd client users as they are
unable to recognise the actual reason for the error.
When the shim returns a NotFound GRPC error, it is properly translated
by the server and clients receive a proper NotFound error instead of
Unknown
Please note that we (CF Garden) would like to have the eventual fix backported to 1.4 as well.
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
oom_score_adj must be in the range -1000 to 1000. In AdjustOOMScore if containerd's score is already at the maximum value we should set that value for the shim instead of trying to set 1001 which is invalid.
Signed-off-by: Simon Kaegi <simon_kaegi@ca.ibm.com>
The new function `WithLogURI(uri *url.URL)` replaces `WithBinaryLogURI(binary string, args map[string]string)`
so as to allow passing an existring URI object.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
If the shim has been killed and ttrpc connection has been
closed, the shimErr will not be nil. For this case, the event
subscriber, like moby/moby, might have received the exit or delete
events. Just in case, we should allow ttrpc-callback-on-close to
send the exit and delete events again. And the exit status will
depend on result of shimV2.Delete.
If not, the shim has been delivered the exit and delete events.
So we should remove the task record and prevent duplicate events from
ttrpc-callback-on-close.
Fix: #4769
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This allows filesystem-based ACLs for configuring access to the socket
of a shim.
Ported from Michael Crosby's similar patch for v2 shims.
Signed-off-by: Samuel Karp <skarp@amazon.com>
This allows filesystem based ACLs for configuring access to the socket of a
shim.
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Michael Crosby <michael.crosby@apple.com>
The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.
The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.
And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This `Info` log shows up for all exec processes that use the v1 shim
with Docker because Docker deletes the process once it receives the exit
event from containerd.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.
Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.
The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.
Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
`attempt*time.Second`.
In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches `maxRequeue`
(which is currently 5).
This change moves the timeout to the publisher so each send attempt gets
its own timeout.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.
Fix#4326
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Introduce LogURIGenerator helper function in cio package. It is used in
the restart options, like WithBinaryLogURI and WithFileLogURI.
And restart.LogPathLabel might be used in production and work well. In
order to reduce breaking change, the LogPathLabel is still recognized if
new LogURILabel is not set. In next release 1.5, the LogPathLabel will
be removed.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Currently, we send a single SIGKILL to the shim process
once and then we spin in a loop where we use kill(pid, 0)
to detect when the pid has disappeared completely.
Unfortunately, this has a race condition since pids can be reused causing us
to spin in an infinite loop when that happens.
This adds a timeout to this loop which logs a warning and exits the
infinite loop.
Signed-off-by: Ashray Jain <ashrayj@palantir.com>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
TestRuntimeWithEmptyMaxEnvProcs should restore the GoMaxProcs after
test so that the temporary change of GoMaxProcs will not impact other
case, like TestRuntimeWithNonEmptyMaxEnvProcs.
Signed-off-by: Wei Fu <fuweid89@gmail.com>