/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor
v4.0.0. This change breaks stopping of containers, because the profile
assigned to containers doesn't accept signals from the "runc" peer.
AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.
The issue is reproducible both with nerdctl and ctr clients. In the case
of ctr, the --apparmor-default-profile flag has to be specified,
otherwise the container processes would inherit the runc profile, which
behaves as unconfined, and so the subsequent runc process invoked to
stop it would be able to signal it.
Test commands:
root@cloudimg:~# nerdctl run -d --name foo nginx:latest
3d1e74bfe6e7b2912d9223050ae8a81a8f4b73de0846e6d9c956c1e411cdd95a
root@cloudimg:~# nerdctl stop foo
FATA[0000] 1 errors:
unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
: unknown
or
root@cloudimg:~# ctr pull docker.io/library/nginx:latest
...
root@cloudimg:~# ctr run -d --apparmor-default-profile ctr-default docker.io/library/nginx:latest foo
root@cloudimg:~# ctr task kill foo
ctr: unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
: unknown
Relevant syslog messages (with long lines wrapped):
Apr 23 22:03:12 cloudimg kernel: audit:
type=1400 audit(1713909792.064:262): apparmor="DENIED"
operation="signal" class="signal" profile="nerdctl-default"
pid=13483 comm="runc" requested_mask="receive"
denied_mask="receive" signal=quit peer="runc"
or
Apr 23 22:05:32 cloudimg kernel: audit:
type=1400 audit(1713909932.106:263): apparmor="DENIED"
operation="signal" class="signal" profile="ctr-default"
pid=13574 comm="runc" requested_mask="receive"
denied_mask="receive" signal=quit peer="runc"
This change extends the default profile with rules that allow receiving
signals from processes that run confined with either runc or crun
profile (crun[2] is an alternative OCI runtime that's also confined in
AppArmor >= v4.0.0, see [1]). It is backward compatible because the peer
value is a regular expression (AARE) so the referenced profile doesn't
have to exist for this profile to successfully compile and load.
[1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936
[2] https://github.com/containers/crun
Signed-off-by: Tomáš Virtus <nechtom@gmail.com>
Fix containerd/nerdctl issue 2730
> [Rootless] `nerdctl rm` fails when AppArmor is loaded:
> `error="unknown error after kill: runc did not terminate successfully: exit status 1:
> unable to signal init: permission denied\n: unknown"`
Caused by:
> kernel: audit: type=1400 audit(1713840662.766:122): apparmor="DENIED" operation="signal" class="signal"
> profile="nerdctl-default" pid=366783 comm="runc" requested_mask="receive" denied_mask="receive" signal=kill
> peer="/usr/local/bin/rootlesskit"
The issue is known to happen on Ubuntu 23.10 and 24.04 LTS.
Doesn't seem to happen on Ubuntu 22.04 LTS.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This is effectively a revert of 2ac9968401, which
switched from os/exec to the golang.org/x/sys/execabs package to mitigate
security issues (mainly on Windows) with lookups resolving to binaries in the
current directory.
from the go1.19 release notes https://go.dev/doc/go1.19#os-exec-path
> ## PATH lookups
>
> Command and LookPath no longer allow results from a PATH search to be found
> relative to the current directory. This removes a common source of security
> problems but may also break existing programs that depend on using, say,
> exec.Command("prog") to run a binary named prog (or, on Windows, prog.exe) in
> the current directory. See the os/exec package documentation for information
> about how best to update such programs.
>
> On Windows, Command and LookPath now respect the NoDefaultCurrentDirectoryInExePath
> environment variable, making it possible to disable the default implicit search
> of “.” in PATH lookups on Windows systems.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
While this is not strictly necessary as the default OCI config masks this
path, it is possible that the user disabled path masking, passed their
own list, or is using a forked (or future) daemon version that has a
modified default config/allows changing the default config.
Add some defense-in-depth by also masking out this problematic hardware
device with the AppArmor LSM.
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
This code was no longer used now that the version-dependent rules were
removed from the template in 30c893ec5cba64de1bca0a2a9d3f92423f3ec0d7.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These conditions were added in docker in 8cf89245f5
to account for old versions of debian/ubuntu (apparmor_parser < 2.8.95)
that lacked some options;
> This allows us to use the apparmor profile we have in contrib/apparmor/
> and solves the problems where certain functions are not apparent on older
> versions of apparmor_parser on debian/ubuntu.
Those patches were from 2015/2016, and all currently supported distro
versions should now have more current versions than that. Looking at the
oldest supported versions;
Ubuntu 18.04 "Bionic":
apparmor_parser --version
AppArmor parser version 2.12
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2012 Canonical Ltd.
Debian 10 "Buster"
apparmor_parser --version
AppArmor parser version 2.13.2
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2018 Canonical Ltd.
This patch removes the version-dependent rules.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Fixes https://github.com/containerd/containerd/issues/7695. The default profile allows processes within the container to trace others, but blocks reads/traces. This means that diagnostic facilities in processes can't easily collect crash/hang dumps. A usual workflow used by solutions like crashpad and similar projects is that the process that's unresponsive will spawn a process to collect diagnostic data using ptrace. seccomp-bpf, yama ptrace settings, and CAP_SYS_PTRACE already provide security mechanisms to reduce the scopes in which the API can be used. This enables reading from /proc/* files provided the tracer process passes all other checks.
Signed-off-by: Juan Hoyos <juan.s.hoyos@outlook.com>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.
This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.
From the related blogpost (ttps://blog.golang.org/path-security):
> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing
This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
On newer kernels and systems, AppArmor will block sending signals in
many scenarios by default resulting in strange behaviours (container
programs cannot signal each other, or host processes like containerd
cannot signal containers).
The reason this happens only on some distributions (and is not a kernel
regression) is that the kernel doesn't enforce signal mediation unless
the profile contains signal rules. However because our profies #include
the distribution-managed <abstractions/base>, some distributions added
signal rules -- which results in AppArmor enforcing signal mediation and
thus a regression. On these systems, containers cannot send and receive
signals at all -- meaning they cannot signal each other and the
container runtime cannot kill them either.
This issue was fixed in Docker in 2018[1] but this code was copied
before then and thus the patches weren't carried. It also contains a new
fix for a more esoteric case[2]. Ideally this code should live in a
project like "containerd/apparmor" so that Docker, libpod, and
containerd can share it, but that's probably something to do separately.
In addition, the copyright header is updated to reference that the code
is copied from Docker (and thus was not written entirely by the
containerd authors).
[1]: https://github.com/docker/docker/pull/37831
[2]: https://github.com/docker/docker/pull/41337
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This adds default apparmor profile generation to the containerd client
so that profiles can be generated with a SpecOpt
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>