Because RunPodSandbox and CreateContainer will access metadata
without check, pod or container config file without metadata will
crash containerd.
This patch add checks to handle the issue.
Fixes: #1009
Signed-off-by: Hui Zhu <teawater@hyper.sh>
VM isolated runtimes can support privileged workloads. In this
scenario, access to the guest VM is provided instead of the host.
Based on this, allow untrusted runtimes to run privileged workloads.
If the workload is specifically asking for node PID/IPC/network, etc.,
then continue to require the trusted runtime.
This commit repurposes the hostPrivilegedSandbox utility function to
only check for node namespace checking.
Fixes: #855
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Also add new dependencies on github.com/xeipuuv/gojson* (brought up by
new runtime-tools) and adapt the containerd/cri code to replace the APIs
that were removed by runtime-tools.
In particular, add new helpers to handle the capabilities, since
runtime-tools now split them into separate sets of functions for each
capability set.
Replace g.Spec() with g.Config since g.Spec() has been deprecated in the
runtime-tools API.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
1. Currently, Unmount() call takes a burden to parse the whole nine yards
of /proc/self/mountinfo to figure out whether the given mount point is
mounted or not (and returns an error in case parsing fails somehow).
Instead, let's just call umount() and ignore EINVAL, which results
in the same behavior, but much better performance.
This also introduces a slight change: in case target does not exist,
the appropriate error (ENOENT) is returned -- document that.
2. As Unmount() is always used with MNT_DETACH flag, let's drop the
flags argument. This way, the only reason of EINVAL returned from
umount(2) can only be "target is not mounted".
3. While at it, remove the 'containerdmount' alias from the package.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Some CRI compatible runtimes may not support provileged operations.
Specifically hypervisor based runtimes (like kata-containers, cc-runtime
and runv) do not support privileged operations like:
- Provide access to the host namespaces
- Create fully privileged containers with access to host devices
Hypervisor based runtimes create container workloads within virtual machines.
When a running host privileged containers using them,
they wont provide support to requested the privileged opertations.
This commits add the new options to define two runtimes:
Trusted runtime : Used when a privileged container is requested.
Default runtime : for non-privileged workloads.
A container that belongs to a privileged pod will inherent this property
an will be created with the trusted runtime.
- Add options to define trusted runtime
- Add logic to decide if a sanbox is trusted
- Export annotation containers below to a trusted sandbox
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
For hypervisor-based container runtimes (like Kata Containers, Clear Containers
or runv) a pod will be created in a VM and then create containers within the VM.
When a runtime is requested for container commands like create and start, both
the instal "pause" container and next containers need to be added to the pod
namespace (same VM).
A runtime does not know if it needs to create/start a VM or if it needs to add a
container to an already running VM pod.
This patch adds a way to provide this information through container annotations.
When starting a container or a sandbox, 2 annotations are added:
- type (Container or Sandbox)
- sandbox name
This allow to a VM based runtime to decide if they need to create a pod VM or
container within the VM pod.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>