Extract the names of requested CDI devices and update the OCI
Spec according to the corresponding CDI device specifications.
CDI devices are requested using container annotations in the
cdi.k8s.io namespace. Once CRI gains dedicated fields for CDI
injection the snippet for extracting CDI names will need an
update.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
The Linux kernel never sets the Inheritable capability flag to
anything other than empty. Non-empty values are always exclusively
set by userspace code.
[The kernel stopped defaulting this set of capability values to the
full set in 2000 after a privilege escalation with Capabilities
affecting Sendmail and others.]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.
This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.
It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.
This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Looks like we had our own copy of the "getDevices" code already, so use
that code (which also matches the code that's used to _generate_ the spec,
so a better match).
Moving the code to a separate file, I also noticed that the _unix and _linux
code was _exactly_ the same (baring some `//nolint:` comments), so also
removing the duplicated code.
With this patch applied, we removed the dependency on the libcontainer/devices
package (leaving only libcontainer/user).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change is needed for running the latest containerd inside Docker
that is not aware of the recently added caps (BPF, PERFMON, CHECKPOINT_RESTORE).
Without this change, containerd inside Docker fails to run containers with
"apply caps: operation not permitted" error.
See kubernetes-sigs/kind 2058
NOTE: The caller process of this function is now assumed to be as
privileged as possible.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Looks like this import was not needed for the test; simplified the test
by just using the device-path (a counter would work, but for debugging,
having the list of paths can be useful).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For some tools having the actual image name in the annotations is helpful for
debugging and auditing the workload.
Signed-off-by: Michael Crosby <michael@thepasture.io>
This is a followup to #4699 that addresses an oversight that could cause
the CRI to relabel the host /dev/shm, which should be a no-op in most
cases. Additionally, fixes unit tests to make correct assertions for
/dev/shm relabeling.
Discovered while applying the changes for #4699 to containerd/cri 1.4:
https://github.com/containerd/cri/pull/1605
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>