This change is needed for running the latest containerd inside Docker
that is not aware of the recently added caps (BPF, PERFMON, CHECKPOINT_RESTORE).
Without this change, containerd inside Docker fails to run containers with
"apply caps: operation not permitted" error.
See kubernetes-sigs/kind 2058
NOTE: The caller process of this function is now assumed to be as
privileged as possible.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Newer golangci-lint needs explicit `//` separator. Otherwise it treats
the entire line (`staticcheck deprecated ... yet`) as a name.
https://golangci-lint.run/usage/false-positives/#nolint
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
Looks like this import was not needed for the test; simplified the test
by just using the device-path (a counter would work, but for debugging,
having the list of paths can be useful).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For some tools having the actual image name in the annotations is helpful for
debugging and auditing the workload.
Signed-off-by: Michael Crosby <michael@thepasture.io>
This was dumping untrusted output to the debug logs from user containers.
We should not dump this type of information to reduce log sizes and any
information leaks from user containers.
Signed-off-by: Michael Crosby <michael@thepasture.io>
The event monitor handles exit events one by one. If there is something
wrong about deleting task, it will slow down the terminating Pods. In
order to reduce the impact, the exit event watcher should handle exit
event separately. If it failed, the watcher should put it into backoff
queue and retry it.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Use a PrefixFilter() to get only the mounts we're interested in,
which removes the need to manually filter mounts from the mountinfo
results.
Additional optimizations can be made, as:
> ... there's a little known fact that `umount(MNT_DETACH)` is actually
> recursive in Linux, IOW this function can be replaced with
> `unix.Umount(target, unix.MNT_DETACH)` (or `mount.UnmountAll(target, unix.MNT_DETACH)`
> (provided that target itself is a mount point).
e8fb2c392f (r535450446)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
containerd is responsible for creating the log but there is no code to ensure
that the log dir exists. While kubelet should have created this there can be
times where this is not the case and this can cause stuck tasks.
Signed-off-by: Michael Crosby <michael@thepasture.io>
The build tag was removed in go-selinux v1.8.0: opencontainers/selinux#132
Related: remove "apparmor" build tag: 0a9147f3aa
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
When a base runtime spec is being used, admins can configure defaults for the
spec so that default ulimits or other security related settings get applied for
all containers launched.
Signed-off-by: Michael Crosby <michael@thepasture.io>
recent versions of libcontainer/apparmor simplified the AppArmor
check to only check if the host supports AppArmor, but no longer
checks if apparmor_parser is installed, or if we're running
docker-in-docker;
bfb4ea1b1b
> The `apparmor_parser` binary is not really required for a system to run
> AppArmor from a runc perspective. How to apply the profile is more in
> the responsibility of higher level runtimes like Podman and Docker,
> which may do the binary check on their own.
This patch copies the logic from libcontainer/apparmor, and
restores the additional checks.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is a followup to #4699 that addresses an oversight that could cause
the CRI to relabel the host /dev/shm, which should be a no-op in most
cases. Additionally, fixes unit tests to make correct assertions for
/dev/shm relabeling.
Discovered while applying the changes for #4699 to containerd/cri 1.4:
https://github.com/containerd/cri/pull/1605
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
Address an issue originally seen in the k3s 1.3 and 1.4 forks of containerd/cri, https://github.com/rancher/k3s/issues/2240
Even with updated container-selinux policy, container-local /dev/shm
will get mounted with container_runtime_tmpfs_t because it is a tmpfs
created by the runtime and not the container (thus, container_runtime_t
transition rules apply). The relabel mitigates such, allowing envoy
proxy to work correctly (and other programs that wish to write to their
/dev/shm) under selinux.
Tested locally with:
- SELINUX=Enforcing vagrant up --provision-with=shell,selinux,test-integration
- SELINUX=Enforcing CRITEST_ARGS=--ginkgo.skip='HostIpc is true' vagrant up --provision-with=shell,selinux,test-cri
- SELINUX=Permissive CRITEST_ARGS=--ginkgo.focus='HostIpc is true' vagrant up --provision-with=shell,selinux,test-cri
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>