This allows configuring the location of the overlayfs snapshotter by
adding the following in config.toml
```
[plugins]
[plugins.overlayfs]
root_path = "/custom_location"
```
This is useful to isolate disk i/o for overlayfs from the rest of
containerd and prevent containers saturating disk i/o from negatively
affecting containerd operations and cause timeouts.
Signed-off-by: Ashray Jain <ashrayj@palantir.com>
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.
Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
Adds the io-uring related system call introduced in kernel 5.1 to the
seccomp whitelist. With older kernels or older versions of libseccomp,
this configure will be omitted.
Note that io_uring will grow support for more syscalls in the future
so we should keep an eye on this.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
warning when trying to flush data to disks (which happens very frequently):
WARNING: could not flush dirty data: Operation not permitted.
A quick dig in postgres source code indicate it uses sync_file_range(2) to
flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
for alignements reasons.
The profile did not allow sync_file_range2(2), making postgres sad because
it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
syscall was added for PowerPC.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From personality(2):
Have uname(2) report a 2.6.40+ version number rather than a 3.x version
number. Added as a stopgap measure to support broken applications that
could not handle the kernel version-numbering switch from 2.6.x to 3.x.
This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".
Fixes: "setarch broken in docker packages from Debian stretch"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add the membarrier syscall to the default seccomp profile.
It is for example used in the implementation of dlopen() in
the musl libc of Alpine images.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege.
The kernel will check CAP_SYS_TIME and won't allow setting the time.
Fixes: Getting the system time with ntptime returns an error in an unprivileged
container
To verify, inside a CentOS 7 container:
yum install -y ntp
ntptime
# ntp_gettime() returns code 0 (OK)
ntpdate -v time.nist.gov
# ntpdate[84]: Can't adjust the time of day: Operation not permitted
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This call is what is used to implement `dmesg` to get kernel messages
about the host. This can leak substantial information about the host.
It is normally available to unprivileged users on the host, unless
the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set
by standard on the majority of distributions. Blocking this to restrict
leaks about the configuration seems correct.
Relates to docker/docker#37897 "docker exposes dmesg to containers by default"
See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This prevents packages with no Go included files due to build constraints
being included in the package list. These packages cause the test command
to fail with "can't load package build constraints exclude all Go files".
Signed-off-by: Derek McGowan <derek@mcg.dev>
This allows development with container to be done for NRI without the need for
custom builds.
This is an experimental feature and is not enabled unless a user has a global
`/etc/nri/conf.json` config setup with plugins on the system. No NRI code will
be executed if this config file does not exist.
Signed-off-by: Michael Crosby <michael@thepasture.io>
When containerd is restarted, only v1 tasks are monitored again. This
leads to the lack of existing v2 task metrics.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
The rollback mechanism is implemented by calling deleteDevice() and
RemoveDevice(). But RemoveDevice() is internally calling
deleteDevice() as well.
Since a device will be deleted by first deleteDevice(),
RemoveDevice() always will see ENODATA. The specific error must be
ignored to remove the device's metadata correctly.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>