This function will be used by nerdctl for printing the default AppArmor
profile: `nerdctl system inspect apparmor-profile`
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
On newer kernels and systems, AppArmor will block sending signals in
many scenarios by default resulting in strange behaviours (container
programs cannot signal each other, or host processes like containerd
cannot signal containers).
The reason this happens only on some distributions (and is not a kernel
regression) is that the kernel doesn't enforce signal mediation unless
the profile contains signal rules. However because our profies #include
the distribution-managed <abstractions/base>, some distributions added
signal rules -- which results in AppArmor enforcing signal mediation and
thus a regression. On these systems, containers cannot send and receive
signals at all -- meaning they cannot signal each other and the
container runtime cannot kill them either.
This issue was fixed in Docker in 2018[1] but this code was copied
before then and thus the patches weren't carried. It also contains a new
fix for a more esoteric case[2]. Ideally this code should live in a
project like "containerd/apparmor" so that Docker, libpod, and
containerd can share it, but that's probably something to do separately.
In addition, the copyright header is updated to reference that the code
is copied from Docker (and thus was not written entirely by the
containerd authors).
[1]: https://github.com/docker/docker/pull/37831
[2]: https://github.com/docker/docker/pull/41337
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
These syscalls (some of which have been in Linux for a while but were
missing from the profile) fall into a few buckets:
* close_range(2), epoll_wait2(2) are just extensions of existing "safe
for everyone" syscalls.
* The mountv2 API syscalls (fs*(2), move_mount(2), open_tree(2)) are
all equivalent to aspects of mount(2) and thus go into the
CAP_SYS_ADMIN category.
* process_madvise(2) is similar to the other process_*(2) syscalls and
thus goes in the CAP_SYS_PTRACE category.
Co-authored-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Adds the io-uring related system call introduced in kernel 5.1 to the
seccomp whitelist. With older kernels or older versions of libseccomp,
this configure will be omitted.
Note that io_uring will grow support for more syscalls in the future
so we should keep an eye on this.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
warning when trying to flush data to disks (which happens very frequently):
WARNING: could not flush dirty data: Operation not permitted.
A quick dig in postgres source code indicate it uses sync_file_range(2) to
flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
for alignements reasons.
The profile did not allow sync_file_range2(2), making postgres sad because
it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
syscall was added for PowerPC.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From personality(2):
Have uname(2) report a 2.6.40+ version number rather than a 3.x version
number. Added as a stopgap measure to support broken applications that
could not handle the kernel version-numbering switch from 2.6.x to 3.x.
This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".
Fixes: "setarch broken in docker packages from Debian stretch"
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add the membarrier syscall to the default seccomp profile.
It is for example used in the implementation of dlopen() in
the musl libc of Alpine images.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege.
The kernel will check CAP_SYS_TIME and won't allow setting the time.
Fixes: Getting the system time with ntptime returns an error in an unprivileged
container
To verify, inside a CentOS 7 container:
yum install -y ntp
ntptime
# ntp_gettime() returns code 0 (OK)
ntpdate -v time.nist.gov
# ntpdate[84]: Can't adjust the time of day: Operation not permitted
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This call is what is used to implement `dmesg` to get kernel messages
about the host. This can leak substantial information about the host.
It is normally available to unprivileged users on the host, unless
the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set
by standard on the majority of distributions. Blocking this to restrict
leaks about the configuration seems correct.
Relates to docker/docker#37897 "docker exposes dmesg to containers by default"
See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Expose environment variables in the GCE containerd configuration
script for configuring an additional runtime handler. This unblocks
E2E testing of custom runtime handlers.
Signed-off-by: Tim Allclair <tallclair@google.com>
Format the cni.template, use `space` instead of some `tab`. Avoid indent issue in text editor.
Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
Restartable Sequences (rseq) are a kernel-based mechanism for fast
update operations on per-core data in user-space. Some libraries, like
the newest version of Google's TCMalloc, depend on it [1].
This also makes dockers default seccomp profile on par with systemd's,
which enabled 'rseq' in early 2019 [2].
1: https://google.github.io/tcmalloc/design.html
2: systemd/systemd@6fee3be
Signed-off-by: Florian Schmaus <flo@geekplace.eu>
use goroutines to copy the data from the stream to the TCP
connection, and viceversa, removing the socat dependency.
Quoting Lantao Liu, the logic is as follow:
When one side (either pod side or user side) of portforward
is closed, we should stop port forwarding.
When one side is closed, the io.Copy use that side as source will close,
but the io.Copy use that side as dest won't.
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Relates to https://patchwork.kernel.org/patch/10756415/
Added to whitelist:
- `clock_getres_time64` (equivalent of `clock_getres`, which was whitelisted)
- `clock_gettime64` (equivalent of `clock_gettime`, which was whitelisted)
- `clock_nanosleep_time64` (equivalent of `clock_nanosleep`, which was whitelisted)
- `futex_time64` (equivalent of `futex`, which was whitelisted)
- `io_pgetevents_time64` (equivalent of `io_pgetevents`, which was whitelisted)
- `mq_timedreceive_time64` (equivalent of `mq_timedreceive`, which was whitelisted)
- `mq_timedsend_time64 ` (equivalent of `mq_timedsend`, which was whitelisted)
- `ppoll_time64` (equivalent of `ppoll`, which was whitelisted)
- `pselect6_time64` (equivalent of `pselect6`, which was whitelisted)
- `recvmmsg_time64` (equivalent of `recvmmsg`, which was whitelisted)
- `rt_sigtimedwait_time64` (equivalent of `rt_sigtimedwait`, which was whitelisted)
- `sched_rr_get_interval_time64` (equivalent of `sched_rr_get_interval`, which was whitelisted)
- `semtimedop_time64` (equivalent of `semtimedop`, which was whitelisted)
- `timer_gettime64` (equivalent of `timer_gettime`, which was whitelisted)
- `timer_settime64` (equivalent of `timer_settime`, which was whitelisted)
- `timerfd_gettime64` (equivalent of `timerfd_gettime`, which was whitelisted)
- `timerfd_settime64` (equivalent of `timerfd_settime`, which was whitelisted)
- `utimensat_time64` (equivalent of `utimensat`, which was whitelisted)
Not added to whitelist:
- `clock_adjtime64` (equivalent of `clock_adjtime`, which was not whitelisted)
- `clock_settime64` (equivalent of `clock_settime`, which was not whitelisted)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/golang/go/compare/go1.13.6...go1.13.7
go1.13.7 (released 2020/01/28) includes two security fixes. One mitigates
the CVE-2020-0601 certificate verification bypass on Windows. The other affects
only 32-bit architectures.
https://github.com/golang/go/issues?q=milestone%3AGo1.13.7+label%3ACherryPickApproved
- X.509 certificate validation bypass on Windows 10
A Windows vulnerability allows attackers to spoof valid certificate chains when
the system root store is in use. These releases include a mitigation for Go
applications, but it’s strongly recommended that affected users install the
Windows security update to protect their system.
This issue is CVE-2020-0601 and Go issue golang.org/issue/36834.
- Panic in crypto/x509 certificate parsing and golang.org/x/crypto/cryptobyte
On 32-bit architectures, a malformed input to crypto/x509 or the ASN.1 parsing
functions of golang.org/x/crypto/cryptobyte can lead to a panic.
The malformed certificate can be delivered via a crypto/tls connection to a
client, or to a server that accepts client certificates. net/http clients can
be made to crash by an HTTPS server, while net/http servers that accept client
certificates will recover the panic and are unaffected.
Thanks to Project Wycheproof for providing the test cases that led to the
discovery of this issue. The issue is CVE-2020-7919 and Go issue golang.org/issue/36837.
This is also fixed in version v0.0.0-20200124225646-8b5121be2f68 of golang.org/x/crypto/cryptobyte.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Cleanup is an optional method a snapshotter may implement.
Cleanup can be used to cleanup resources after a snapshot
has been removed. This function allows a snapshotter to defer
longer resource cleanup until after snapshot removals are
completed. Adding this to the API allows proxy snapshotters
to leverage this enhancement.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
go1.12.13 (released 2019/10/31) fixes an issue on macOS 10.15 Catalina
where the non-notarized installer and binaries were being rejected by
Gatekeeper. Only macOS users who hit this issue need to update.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since recent versions of `vndr` are going to remove the autocomplete
scripts from the urfave vendored content, we will just move them into
`contrib/` and reference them in the documentation from that location.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Golang 1.12.12
-------------------------------
go1.12.12 (released 2019/10/17) includes fixes to the go command, runtime,
syscall and net packages. See the Go 1.12.12 milestone on our issue tracker for
details.
https://github.com/golang/go/issues?q=milestone%3AGo1.12.12
Golang 1.12.11 (CVE-2019-17596)
-------------------------------
go1.12.11 (released 2019/10/17) includes security fixes to the crypto/dsa
package. See the Go 1.12.11 milestone on our issue tracker for details.
https://github.com/golang/go/issues?q=milestone%3AGo1.12.11
[security] Go 1.13.2 and Go 1.12.11 are released
Hi gophers,
We have just released Go 1.13.2 and Go 1.12.11 to address a recently reported
security issue. We recommend that all affected users update to one of these
releases (if you're not sure which, choose Go 1.13.2).
Invalid DSA public keys can cause a panic in dsa.Verify. In particular, using
crypto/x509.Verify on a crafted X.509 certificate chain can lead to a panic,
even if the certificates don't chain to a trusted root. The chain can be
delivered via a crypto/tls connection to a client, or to a server that accepts
and verifies client certificates. net/http clients can be made to crash by an
HTTPS server, while net/http servers that accept client certificates will
recover the panic and are unaffected.
Moreover, an application might crash invoking
crypto/x509.(*CertificateRequest).CheckSignature on an X.509 certificate
request, parsing a golang.org/x/crypto/openpgp Entity, or during a
golang.org/x/crypto/otr conversation. Finally, a golang.org/x/crypto/ssh client
can panic due to a malformed host key, while a server could panic if either
PublicKeyCallback accepts a malformed public key, or if IsUserAuthority accepts
a certificate with a malformed public key.
The issue is CVE-2019-17596 and Go issue golang.org/issue/34960.
Thanks to Daniel Mandragona for discovering and reporting this issue. We'd also
like to thank regilero for a previous disclosure of CVE-2019-16276.
The Go 1.13.2 release also includes a fix to the compiler that prevents improper
access to negative slice indexes in rare cases. Affected code, in which the
compiler can prove that the index is zero or negative, would have resulted in a
panic in Go 1.12, but could have led to arbitrary memory read and writes in Go
1.13 and Go 1.13.1. This is Go issue golang.org/issue/34802.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This whitelists the statx syscall; libseccomp-2.3.3 or up
is needed for this, older seccomp versions will ignore this.
Equivalent of https://github.com/moby/moby/pull/36417
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
io_pgetevents() is a new Linux system call, similar to the already-whitelisted
io_getevents(). It has no security implications. Whitelist it so applications can
use the new system call.
Fixes#3105.
Signed-off-by: Avi Kivity <avi@scylladb.com>
This improves nvidia support for multiple uuids per container and fixes
the API to add individual capabilities.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows non-privileged users to use containerd. This is part of a
larger track of work integrating containerd into Cloudfoundry's garden
with support for rootless.
[#156343575]
Signed-off-by: Claudia Beresford <cberesford@pivotal.io>
As of opencontainers/runc@db093f6 runc no longer depends on libapparmor
thus libapparmor-dev no longer needs to be installed to build it or
anythind that depends on it (like containerd or cri-containerd). Adjust
the documentation accordingly.
containerd/containerd#2238 did the same for containerd.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
As of opencontainers/runc@db093f621f runc
no longer depends on libapparmor thus libapparmor-dev no longer needs to
be installed to build it. Adjust the documentation accordingly.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
No capabilities can be granted outside the bounding set, so there
is no point looking at any other set for the largest scope.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Use golang:1.9, which should get the latest 1.9.x version,
instead of using a specific tag.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>