- Add Target to mount.Mount.
- Add UnmountMounts to unmount a list of mounts in reverse order.
- Add UnmountRecursive to unmount deepest mount first for a given target, using
moby/sys/mountinfo.
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
All occurrences only passed a PID, so we can use this utility to make
the code more symmetrical with their cgroups v2 counterparts.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
runc option --criu is now ignored (with a warning), and the option will be
removed entirely in a future release. Users who need a non- standard criu
binary should rely on the standard way of looking up binaries in $PATH.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit removes gogoproto.enumvalue_customname,
gogoproto.goproto_enum_prefix and gogoproto.enum_customname.
All of them make proto-generated Go code more idiomatic, but we already
don't use these enums in our external-surfacing types and they are anyway
not supported by Google's official toolchain (see #6564).
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
If containerd-shim-runc-v1 process dead abnormally, such as received
kill -s 9 signal, panic or other unkown reasons, the containerd-shim-runc-v1
server can not reap runc container and forward init process exit event.
This will lead the container leaked in dockerd. When shim dead, containerd
will clean dead shim, here read init process pid and forward exit event
with pid at the same time.
Related to: #6402
Signed-off-by: Jeff Zvier <zvier20@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.
The containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.
kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html
Signed-off-by: Michael Crosby <michael@thepasture.io>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.
This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.
From the related blogpost (ttps://blog.golang.org/path-security):
> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing
This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When runC shimv2 starts, the StartShim interface will re-exec itself as
long-running process, which will read the `address` during initializing.
```happycase
Process
containerd-shim-runc-v1/v2 start containerd-shim-runc-v1/v2
initializing socket
reexec containerd-shim-runc-v1/v2
write address into file
initializing
read address
write back to containerd daemon
serving
...
remove address in Shutdown call
```
However, there is no synchronization after reexec. Then the data race is
like:
```leaking-case
Process
containerd-shim-runc-v1/v2 start containerd-shim-runc-v1/v2
initializing socket
reexec containerd-shim-runc-v1/v2
initializing
read address
write address into file
write back to containerd daemon
serving
...
fail to remove address
because of empty address
```
The `address` should be writen into file first before reexec.
And if shutdown the whole service before cleanup temporary
resource (like socket file), the Shutdown caller will receive `ttrpc: closed`
sometime, which depends on go runtime scheduler. Then it also causes leaking
socket files.
Since the shimV2-Delete binary API must be called to cleanup shim temporary
resource and shimV2-runC-v1 doesn't support grouping multi containers in one,
it is safe to remove the socket file in the binary call for shimV2-runC-v1.
But for the shimV2-runC-v2 shim, we still cleanup socket in Shutdown.
Hopefully we can find a way to cleanup socket in shimV2-Delete binary
call.
Fix: #5173
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This allows filesystem based ACLs for configuring access to the socket of a
shim.
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Michael Crosby <michael.crosby@apple.com>
Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.
Fix#4326
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Only start watching the cgroup for OOMs when the first process starts
instead of on every process.
Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Because of the way go handles flags, passing a flag that is not defined
will cause an error. In our case, if we kept this as a flag, then
third-party shims would break when they see this new flag. To fix this,
I moved this new configuration option to an env var. We should use env
vars from here on out to avoid breaking shim compat.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.
As part of this change, a new configuration section is added for TTRPC
listener options.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
This changes the shim's OOM score from a static max killable of -999 to
be +1 of the containerd daemon's score. This should allow the shim's to
be killed first in an OOM condition but leave the daemon alone for a bit
to help cleanup and manage the containers during this situation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Shims no longer call `os.Exit` but close the context on shutdown so that
events and other resources have hit the `defer`s.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>