Commit Graph

8706 Commits

Author SHA1 Message Date
Wei Fu
7ce23867e3
Merge pull request #4532 from cpuguy83/forward_signal_not_found
Fix some signal forwarder issues
2020-09-06 11:41:51 +08:00
Brian Goff
899b4e3cb5 Ignore SIGURG signals in signal forwarder
Starting with go1.14, the go runtime hijacks SIGURG but with no way to
not send to other signal handlers.

In practice, we get this signal frequently.
I found this while testing out go1.15 with ctr and multiple execs with
only `echo hello`. When the process exits quickly, if the previous
commit is not applied, you end up with an error message that it couldn't
forward SIGURG to the container (due to the process being gone).

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-04 16:19:31 -07:00
Brian Goff
6650510836 Exit signal forward if process not found
Previously the signal loop can end up racing with the process exiting.
Intead of logging and continuing the loop, exit early.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-04 16:17:00 -07:00
Tianon Gravi
2055e12953 Add RPi1/RPi0 workaround
On the very popular Raspberry Pi 1 and Zero devices, the CPU is actually ARMv6, but the chip happens to support the feature bit the kernel uses to differentiate v6/v7, so it gets reported as "CPU architecture: 7" and thus fails to run many of the images that get pulled.

To account for this very popular edge case, this also checks "model name" which on these chips will begin with "ARMv6-compatible" -- we could also check uname, but getCPUInfo is already handy, low overhead, and mirrors the code before this.

Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
2020-09-04 14:19:37 -07:00
Derek McGowan
d4e78200d6
Merge pull request #4518 from knight42/feat/btrfs-config-root-path
feat(snapshot::btrfs): config root_path
2020-09-03 11:12:27 -07:00
Derek McGowan
445e26fff4
Merge pull request #4517 from knight42/feat/native-config-root-path
feat(snapshot::native): config root_path
2020-09-03 11:10:37 -07:00
Phil Estes
a5c6381558
Merge pull request #4523 from errordeveloper/master
Log unexpected responses
2020-09-03 11:00:55 -04:00
Ilya Dmitrichenko
2de55060ee
Log unexpected responses
This accomplishes a few long-standing TODO items, but also helps users
in showing exact registry error messages

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
2020-09-03 14:52:11 +01:00
Phil Estes
43394312cb
Merge pull request #4525 from shishir-a412ed/seccomp
ctr: CLI Flag (seccomp-profile) for setting custom seccomp profile.
2020-09-03 09:14:30 -04:00
Jian Zeng
c50ff694f0
refactor(native): separate init from implementation
Part of #4513

Signed-off-by: Jian Zeng <anonymousknight96@gmail.com>
2020-09-03 19:58:31 +08:00
Jian Zeng
98b0b2a7c6
feat: make native root_path configurable
Part of #4514

Signed-off-by: Jian Zeng <anonymousknight96@gmail.com>
2020-09-03 19:58:05 +08:00
Jian Zeng
a52daa26ae
refactor(btrfs): separate init from implementation
Part of #4513

Signed-off-by: Jian Zeng <anonymousknight96@gmail.com>
2020-09-03 19:54:18 +08:00
Jian Zeng
4154235735
feat: make btrfs root_path configurable
Part of #4514

Signed-off-by: Jian Zeng <anonymousknight96@gmail.com>
2020-09-03 19:52:13 +08:00
Shishir Mahajan
1eae524df6 ctr: CLI Flag (seccomp-profile) for setting custom seccomp profile.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2020-09-02 16:13:11 -07:00
Derek McGowan
cbb4e43763
Merge pull request #4524 from crosbymichael/cri-kata
update cri to 35e623e6bf
2020-09-02 11:27:54 -07:00
Michael Crosby
a2b4745f7d update cri to 35e623e6bf
This includes changes for kata or other kvm based runtimes with selinux support.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2020-09-02 09:46:35 -05:00
Michael Crosby
d2f2733e00
Merge pull request #4508 from mikebrow/readme-update-slack
add help wanted, update slack
2020-09-02 10:18:54 -04:00
Michael Crosby
dedf423b9c
Merge pull request #4519 from cpuguy83/shim_exec_p_debug
shimv1: downgrade poroccess missing log to debug
2020-09-02 10:17:21 -04:00
Derek McGowan
35e623e6bf
Merge pull request #1561 from crosbymichael/kata-se
Handle KVM based runtimes with selinux
2020-09-01 13:12:11 -07:00
Brian Goff
5f9d15eaac shimv1: downgrade poroccess missing log to debug
This `Info` log shows up for all exec processes that use the v1 shim
with Docker because Docker deletes the process once it receives the exit
event from containerd.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-01 10:31:41 -07:00
Mike Brown
6f4fe8245f add help wanted, update slack
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2020-08-31 12:41:28 -05:00
Phil Estes
efa0e80913
Merge pull request #4506 from dmcgowan/refactor-overlay-plugin
Separate overlay implementation from plugin
2020-08-27 08:48:58 -04:00
Derek McGowan
70ffb12c1b
Separate overlay implementation from plugin
Put the overlay plugin in a separate package to allow the overlay package to be
used without needing to import and initialize the plugin.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2020-08-26 18:50:51 -07:00
Michael Crosby
d715d00906 Handle KVM based runtimes with selinux
Signed-off-by: Michael Crosby <michael@thepasture.io>
2020-08-26 21:38:03 -04:00
Derek McGowan
1a89feb5d7
Merge pull request #4505 from ashrayjain/aj/configurable-root
Add configurable overlayfs path
2020-08-26 18:31:16 -07:00
Akshat Kumar
4cc99e57a7 Remove unnecessary logging binary helpers and add godoc
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-26 09:15:02 -07:00
Ashray Jain
5ed177a2da Add configurable overlayfs path
This allows configuring the location of the overlayfs snapshotter by
adding the following in config.toml
```
[plugins]
  [plugins.overlayfs]
    root_path = "/custom_location"
```

This is useful to isolate disk i/o for overlayfs from the rest of
containerd and prevent containers saturating disk i/o from negatively
affecting containerd operations and cause timeouts.

Signed-off-by: Ashray Jain <ashrayj@palantir.com>
2020-08-26 16:08:10 +01:00
Akshat Kumar
7a9fbec5fb Add logging binary support when terminal is true
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.

Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.

Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-25 17:28:29 -07:00
Maksym Pavlenko
27402021ac
Merge pull request #4501 from crosbymichael/runtimeroot
Add --runtime-root to ctr
2020-08-25 13:46:36 -07:00
Derek McGowan
a7b2304f69
Merge pull request #4445 from tonistiigi/auth-refactor
docker: split private token helper functions to reusable pkg
2020-08-25 12:25:23 -07:00
Michael Crosby
bacf07f4a5
Merge pull request #4308 from aojea/bumpcni
bump cni dependencies
2020-08-25 11:54:53 -04:00
Michael Crosby
f9d231f660
Merge pull request #4493 from thaJeztah/seccomp_uring
seccomp: allow io-uring related system calls
2020-08-25 11:39:45 -04:00
Michael Crosby
7e84abe99c
Merge pull request #4468 from prashantbhutani90/master
Report correct stats for windows containers
2020-08-25 11:37:28 -04:00
Michael Crosby
396b863138
Merge pull request #4491 from thaJeztah/seccomp_syslog
seccomp: move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG
2020-08-25 11:35:28 -04:00
Michael Crosby
40ce36fd27
Merge pull request #4492 from thaJeztah/seccomp_updates
seccomp: sync some changes with Docker/Moby's profile
2020-08-25 11:34:34 -04:00
Phil Estes
0586589652
Merge pull request #4486 from darfux/monitor_v2_tasks_as_well
tasks: Monitor v2 tasks in initFunc as well
2020-08-25 10:19:25 -04:00
Phil Estes
8fe6cf567d
Merge pull request #4497 from dmcgowan/update-cri-nri
Update CRI
2020-08-24 19:23:35 -04:00
Derek McGowan
ac95f27b83
Update CRI
Add CRI with NRI support

Signed-off-by: Derek McGowan <derek@mcg.dev>
2020-08-24 14:26:08 -07:00
Derek McGowan
56a89cda34
Merge pull request #1552 from crosbymichael/nri
Add experimental NRI injection points
2020-08-24 13:58:11 -07:00
Mike Brown
d09e26b0a0
Merge pull request #1556 from aojea/cni80
bump cni dependencies
2020-08-24 13:12:24 -05:00
Akihiro Suda
5c73fe06a8
Merge pull request #4472 from fuweid/ignore-error
runtime: ignore ErrNotExist when remove rootfs
2020-08-24 20:08:52 +09:00
Sebastiaan van Stijn
325bac7c71
seccomp: allow io-uring related system calls
Adds the io-uring related system call introduced in kernel 5.1 to the
seccomp whitelist. With older kernels or older versions of libseccomp,
this configure will be omitted.

Note that io_uring will grow support for more syscalls in the future
so we should keep an eye on this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:59:53 +02:00
Sebastiaan van Stijn
0a5ee7e6f3
seccomp: allow clock_settime when CAP_SYS_TIME is added
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:43:21 +02:00
Sebastiaan van Stijn
5cdb6e81d2
seccomp: allow quotactl with CAP_SYS_ADMIN
This allows the quotactl syscall in the default seccomp profile, gated by
CAP_SYS_ADMIN.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:40:43 +02:00
Sebastiaan van Stijn
5862285fac
seccomp: allow sync_file_range2 on supported architectures.
On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
warning when trying to flush data to disks (which happens very frequently):

     WARNING: could not flush dirty data: Operation not permitted.

A quick dig in postgres source code indicate it uses sync_file_range(2) to
flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
for alignements reasons.

The profile did not allow sync_file_range2(2), making postgres sad because
it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
syscall was added for PowerPC.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:36:53 +02:00
Sebastiaan van Stijn
117d678749
seccomp: allow personality with UNAME26 bit set
From personality(2):

    Have uname(2) report a 2.6.40+ version number rather than a 3.x version
    number.  Added as a stopgap measure to support broken applications that
    could not handle the  kernel  version-numbering  switch  from 2.6.x to 3.x.

This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".

Fixes: "setarch broken in docker packages from Debian stretch"

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:27:14 +02:00
Sebastiaan van Stijn
fc9e5d161a
seccomp: allow syscall membarrier
Add the membarrier syscall to the default seccomp profile.
It is for example used in the implementation of dlopen() in
the musl libc of Alpine images.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:26 +02:00
Sebastiaan van Stijn
1746a195e9
seccomp: allow adjtimex get time operation
Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege.
The kernel will check CAP_SYS_TIME and won't allow setting the time.

Fixes: Getting the system time with ntptime returns an error in an unprivileged
container

To verify, inside a CentOS 7 container:

    yum install -y ntp
    ntptime
    # ntp_gettime() returns code 0 (OK)

    ntpdate -v time.nist.gov
    # ntpdate[84]: Can't adjust the time of day: Operation not permitted

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:23 +02:00
Sebastiaan van Stijn
7e7545e556
seccomp: allow add preadv2 and pwritev2 syscalls
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:21 +02:00
Sebastiaan van Stijn
267a0cf68e
seccomp: move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG
This call is what is used to implement `dmesg` to get kernel messages
about the host. This can leak substantial information about the host.
It is normally available to unprivileged users on the host, unless
the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set
by standard on the majority of distributions. Blocking this to restrict
leaks about the configuration seems correct.

Relates to docker/docker#37897 "docker exposes dmesg to containers by default"

See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 11:57:48 +02:00