runc, as mandated by the runtime-spec, ignores unknown fields in the
config.json. This is unfortunate for cases where we _must_ enable that
feature or fail.
For example, if we want to start a container with user namespaces and
volumes, using the uidMappings/gidMappings field is needed so the
UID/GIDs in the volume don't end up with garbage. However, if we don't
fail when runc will ignore these fields (because they are unknown to
runc), we will just start a container without using the mappings and the
UID/GIDs the container will persist to volumes the hostUID/GID, that can
change if the container is re-scheduled by Kubernetes.
This will end up in volumes having "garbage" and unmapped UIDs that the
container can no longer change. So, let's avoid this entirely by just
checking that runc supports idmap mounts if the container we are about
to create needs them.
Please note that the "runc features" subcommand is only run when we are
using idmap mounts. If idmap mounts are not used, the subcommand is not
run and therefore this should not affect containers that don't use idmap
mounts in any way.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
These utilities resulted in the platforms package to have the containerd
API as dependency. As this package is used in many parts of the code, as
well as external consumers, we should try to keep it light on dependencies,
with the potential to make it a standalone module.
These utilities were added in f3b7436b61,
which has not yet been included in a release, so skipping deprecation
and aliases for these.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When the kubelet sends the uid/gid mappings for a mount, just pass them
down to the OCI runtime.
OCI runtimes support this since runc 1.2 and crun 1.8.1.
And whenever we add mounts (container mounts or image spec volumes) and
userns are requested by the kubelet, we use those mappings in the mounts
so the mounts are idmapped correctly. If no userns is used, we don't
send any mappings which just keeps the current behavior.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This brings over the enhancement from a506630e57.
We don't expect the systemd state to change while containerd is running,
so we can use a `sync.Once` for this, to prevent stat'ing each time.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
runc considers libcontainer to be "unstable" (not for external use),
so we try not to use it. Commit ed47d6ba76
brought back the dependency on other parts of libcontainer, but looks to
be only depending on a single utility, which in itself was borrowed from
github.com/coreos/go-systemd to not introduce CGO code in the same package.
This patch copies the version from github.com/coreos/go-systemd (adding
proper attribution, although the function is pretty trivial).
runc is in process of moving the libcontainer/user package to an external
module, which means we can remove the dependency on libcontainer entirely
in the near future. There is one more use of `libcontainer` in our vendor
tree; it looks like CDI is depending on one utility (devices.DeviceFromPath);
a943033a8b/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/container-edits_unix.go (L38)
We should remove the dependency on that utility, and add a CI check to
prevent bringing it back.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The failed to recover state message didn't include the ID making this
not as useful as it could be..
This additionally moves some of the other logs to include the id for
the sandbox/container as a field instead of part of a format string.
Signed-off-by: Danny Canter <danny@dcantah.dev>
The reference/docker package was a fork of github.com/distribution/distribution,
which could not easily be used as a direct dependency, as it brought many other
dependencies with it.
The "reference' package has now moved to a separate repository, which means
we can replace the local fork, and use the upstream implementation again.
The new module was extracted from the distribution repository at commit:
b9b19409cf
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- For remote snapshotters, the unpack phase serves as an important step for
preparing the remote snapshot. With the missing unpacker.Wait, the
snapshotter `Prepare` context is always canceled.
- This patch allows remote snapshotter based archives to be imported via
the transfer service or `ctr image import`
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
This is a partial revert of "cri/sbserver: Use platform instead of GOOS
for userns detection".
While what that commit did is 100% the right thing to do, when the
sandbox_mode is "shim" all controller.XXX() calls are RPCs and the
controller.Create() call initializes the controller. Therefore, things
like "getSandboxController()" don't work in the case of "shim"
sandbox_mode until after the controller.Create().
Due to this asymmetry and the lack of tests for shim mode, we didn't
catch it before.
This patch just reverts that commit so that the Create() and
getSandboxController() calls remain where they were, and just relies on
the config Linux section as a hack to detect if the pod sandbox will use
user namespaces or not.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
cgroupv1HasHugetlb() and cgroupv2HasHugetlb() may return errors, but nobody
(there's just one call site anyways) ever cares. So drop the unnecessary code.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
- Fill OSVersion field of ocispec.Platform for windows OS in
transfer service plugin init()
- Do not return error from transfer service ReceiveStream if
stream.Recv() returned context.Canceled error
Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
In the sbserver we should not use the GOOS, as windows hosts can run
linux containers. On the sbserver we should use the platform param.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Runc 1.1 throws a warning when using rel destination paths, and runc 1.2
is planning to thow an error (i.e. won't start the container).
Let's just make this an abs path in the only place it might not be: the
mounts created due to `VOLUME` directives in the Dockerfile.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
The rpc only reports one field, i.e. the cgroup driver, to kubelet.
Containerd determines the effective cgroup driver by looking at all
runtime handlers, starting from the default runtime handler (the rest in
alphabetical order), and returning the cgroup driver setting of the
first runtime handler that supports one. If no runtime handler supports
cgroup driver (i.e. has a config option for it) containerd falls back to
auto-detection, returning systemd if systemd is running and cgroupfs
otherwise.
This patch implements the CRI server side of Kubernetes KEP-4033:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>