userns.RunningInUserNS() checks if the code calling that function is
running inside a user namespace. But we need to check if the container
we will create will use a user namespace, in that case we need to
disable the sysctl too (or we would need to take the userns mapping into
account to set the IDs).
This was added in PR:
https://github.com/containerd/containerd/pull/6170/
And the param documentation says it is not enabled when user namespaces
are in use:
https://github.com/containerd/containerd/pull/6170/files#diff-91d0a4c61f6d3523b5a19717d1b40b5fffd7e392d8fe22aed7c905fe195b8902R118
I'm not sure if the intention was to disable this if containerd is
running inside a userns (rootless, if that is even supported) or just
when the pod has user namespaces.
Out of an abundance of caution, I'm keeping the userns.RunningInUserNS()
so it is still not used if containerd runs inside a user namespace.
With this patch and "enable_unprivileged_icmp = true" in the config,
running containerd as root on the host, pods with user namespaces start
just fine. Without this patch they fail with:
... failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: w
/proc/sys/net/ipv4/ping_group_range: invalid argument: unknown
Thanks a lot to Andy on the k8s slack for reporting the issue. He also
mentions he hits this with k3s on a default installation (the param
is off by default on containerd, but k3s turns that on by default it
seems). He also debugged which part of the stack was setting that
sysctl, found the PR that added this code in containerd and a workaround
(to turn the bool off).
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Helpers to convert from a slice of platforms to our protobuf representation
and vice-versa appear a couple times. It seems sane to just expose this facility
in the platforms pkg.
Signed-off-by: Danny Canter <danny@dcantah.dev>
This introduces a ParseSourceDateEpoch function, which can be used
to parse "SOURCE_DATE_EPOCH" values for situations where those
values are not passed through an env-var (or the env-var has been
read through other means).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These tests were failing on my macOS; could be the precision issue (like on
Windows), or just because they're "too fast".
=== RUN TestSourceDateEpoch/WithoutSourceDateEpoch
epoch_test.go:51:
Error Trace: /Users/thajeztah/go/src/github.com/containerd/containerd/pkg/epoch/epoch_test.go:51
Error: Should be true
Test: TestSourceDateEpoch/WithoutSourceDateEpoch
Messages: now: 2023-06-23 11:47:09.93118 +0000 UTC, v: 2023-06-23 11:47:09.93118 +0000 UTC
This patch:
- updates the rightAfter utility to allow the timestamps to be "equal"
- updates the asserts to provide some details about the timestamps
- uses UTC for the value we're comparing to, to match the timestamps
that are generated.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As a follow up change to adding a SandboxMetrics rpc to the core
sandbox service, the controller needed a corresponding rpc for CRI
and others to eventually implement.
This leaves the CRI (non-shim mode) controller unimplemented just to
have a change with the API addition to start.
Signed-off-by: Danny Canter <danny@dcantah.dev>
When a container is just created, exited state the container will not have stats. A common case for this in k8s is the init containers for a pod. The will be present in the listed containers but will not have a running task and there for no stats.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
The 10-containerd-net.conflist file generated from the conf_template
should be written atomically so that partial writes are not visible to
CNI plugins. Use the new consistentfile package to ensure this on
Unix-like platforms such as Linux, FreeBSD, and Darwin.
Fixes https://github.com/containerd/containerd/issues/8607
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Certain files may need to be written atomically so that partial writes
are not visible to other processes. On Unix-like platforms such as
Linux, FreeBSD, and Darwin, this is accomplished by writing a temporary
file, syncing, and renaming over the destination file name. On Windows,
the same operations are performed, but Windows does not guarantee that a
rename operation is atomic.
Partial/inconsistent reads can occur due to:
1. A process attempting to read the file while containerd is writing it
(both in the case of a new file with a short/incomplete write or in
the case of an existing, updated file where new bytes may be written
at the beginning but old bytes may still be present after).
2. Concurrent goroutines in containerd leading to multiple active
writers of the same file.
The above mechanism explicitly protects against (1) as all writes are to
a file with a temporary name.
There is no explicit protection against multiple, concurrent goroutines
attempting to write the same file. However, atomically writing the file
should mean only one writer will "win" and a consistent file will be
visible.
Signed-off-by: Samuel Karp <samuelkarp@google.com>
The initial PR had a check for nil metrics but after some refactoring in the PR the test case that was suppose cover HPC was missing a scenario where the metric was not nil but didn't contain any metrics. This fixes that case and adds a testcase to cover it.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.
Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Several bits of code unmarshal image config JSON into an `ocispec.Image`, and then immediately create an `ocispec.Platform` out of it, but then discard the original image *and* miss several potential platform fields (most notably, `variant`).
Because `ocispec.Platform` is a strict subset of `ocispec.Image`, most of these can be updated to simply unmarshal the image config directly to `ocispec.Platform` instead, which allows these additional fields to be picked up appropriately.
We can use `tianon/raspbian` as a concrete reproducer to demonstrate.
Before:
```console
$ ctr content fetch docker.io/tianon/raspbian:bullseye-slim
...
$ ctr image ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/tianon/raspbian:bullseye-slim application/vnd.docker.distribution.manifest.v2+json sha256:66e96f8af40691b335acc54e5f69711584ef7f926597b339e7d12ab90cc394ce 28.6 MiB linux/arm/v7 -
```
(Note that the `PLATFORMS` column lists `linux/arm/v7` -- the image itself is actually `linux/arm/v6`, but one of these bits of code leads to only `linux/arm` being extracted from the image config, which `platforms.Normalize` then updates to an explicit `v7`.)
After:
```console
$ ctr image ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/tianon/raspbian:bullseye-slim application/vnd.docker.distribution.manifest.v2+json sha256:66e96f8af40691b335acc54e5f69711584ef7f926597b339e7d12ab90cc394ce 28.6 MiB linux/arm/v6 -
```
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Windows systems are capable of running both Windows Containers and Linux
containers. For windows containers we need to sanitize the volume path
and skip non-C volumes from the copy existing contents code path. Linux
containers running on Windows and Linux must not have the path sanitized
in any way.
Supplying the targetOS of the container allows us to proprely decide
when to activate that code path.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Images may be created with a VOLUME stanza pointed to drive letters that
are not C:. Currently, an image that has such VOLUMEs defined, will
cause containerd to error out when starting a container.
This change skips copying existing contents to volumes that are not C:.
as an image can only hold files that are destined for the C: drive of a
container.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>