This patch introduces idmapped mounts support for
container rootfs.
The idmapped mounts support was merged in Linux kernel 5.12
torvalds/linux@7d6beb7.
This functionality allows to address chown overhead for containers that
use user namespace.
The changes are based on experimental patchset published by
Mauricio Vásquez #4734.
Current version reiplements support of idmapped mounts using Golang.
Performance measurement results:
Image idmapped mount recursive chown
BusyBox 00.135 04.964
Ubuntu 00.171 15.713
Fedora 00.143 38.799
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Artem Kuzin <artem.kuzin@huawei.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Signed-off-by: Ilya Hanov <ilya.hanov@huawei-partners.com>
The metadata is small and useful for viewing all platforms
for an image and enabling push back to the same registry.
Signed-off-by: Derek McGowan <derek@mcg.dev>
This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.
Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This flag allows cpuset.mems to be specified when running a container. If
provided, the container will use only the defined memory nodes.
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
This flag allows cpuset.cpus to be specified when starting a container. If
provided, the container will use only the defined CPU cores.
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
If a mount destination is specified both in the default spec and in a
--mount option, remove the default mount before adding new mounts. This
allows overriding the default sysfs mount, for example.
Signed-off-by: Samuel Karp <samuelkarp@google.com>
The `ctr image usage` can display the usage of snapshots with a given
image ref. It's easy for user to get chain snapshot IDs and unpack
usage. And according to the [discuss][1], this subcommand can be used to
ensure if snapshot's pagecache has been discarded in a unexpected
reboot.
How to use it:
```bash
$ bin/ctr image usage --snapshotter native docker.io/library/golang:1.19.3
ctr: image docker.io/library/golang:1.19.3 isn't unpacked in snapshotter native
$ bin/ctr image usage --snapshotter overlayfs docker.io/library/golang:1.19.3
KEY SIZE INODES
sha256:28114d8403bac6352c3e09cb23e37208138a0cd9d309edf3df38e57be8075a1d 16.0 KiB 4
sha256:f162c02ce6b9b594757cd76eda1c1dd119b88e69e882cb645bf7ad528b54f0d2 476.2 MiB 13660
sha256:a5b9faceaa495819b9ba7011b7276c4ffaffe6c7b9de0889e11abc1113f7b5ca 225.5 MiB 3683
sha256:412b2615d27d6b0090558d25b201b60a7dff2a40892a7e7ca868b80bf5e5de41 159.8 MiB 6196
sha256:dbce1593502d39c344ce089f98187999f294de5182a7106dcb6c9d04ce0c7265 19.4 MiB 502
sha256:8953bf5d24149e9b2236abc76bd0aa14b73828f1b63e816cb4b457249f6125bc 12.2 MiB 958
sha256:ccba29d6937047c719a6c048a7038d3907590fbb8556418d119469b2ad4f95bc 134.7 MiB 7245
$ bin/ctr image usage --snapshotter overlayfs docker.io/library/golang:1.19
ctr: failed to ensure if image docker.io/library/golang:1.19 exists: image "docker.io/library/golang:1.19": not found
```
[1]: <https://github.com/containerd/containerd/issues/5854#issuecomment-1415915765>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
- Add Target to mount.Mount.
- Add UnmountMounts to unmount a list of mounts in reverse order.
- Add UnmountRecursive to unmount deepest mount first for a given target, using
moby/sys/mountinfo.
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
This adds in a simple flag to control what platform the spec it generates
is for. Useful to easily get a glance at whats the default across platforms.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This makes diff archives to be reproducible.
The value is expected to be passed from CLI applications via the $SOUCE_DATE_EPOCH env var.
See https://reproducible-builds.org/docs/source-date-epoch/
for the $SOURCE_DATE_EPOCH specification.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Add a new ctr cli option, allowing the garbage collector to discard any
unpacked layers after importing an image. This new option is
incompatible with the no-unpack ctr import option.
Signed-off-by: James Jenkins <James.Jenkins@ibm.com>
For Kata Containers, starting a privileged container will fail
if passing all host devices to container due to the permission
issue, like the `privileged_without_host_devices` for CRI service,
add a `privileged-without-host-devices` to `ctr run` command will
disable passing all host devices to containers.
Signed-off-by: bin liu <liubin0329@gmail.com>
Added new runc shim binary in integration testing.
The shim is named by io.containerd.runc-fp.v1, which allows us to use
additional OCI annotation `io.containerd.runtime.v2.shim.failpoint.*` to
setup shim task API's failpoint. Since the shim can be shared with
multiple container, like what kubernetes pod does, the failpoint will be
initialized during setup the shim server. So, the following the
container's OCI failpoint's annotation will not work.
This commit also updates the ctr tool that we can use `--annotation` to
specify annotations when run container. For example:
```bash
➜ ctr run -d --runtime runc-fp.v1 \
--annotation "io.containerd.runtime.v2.shim.failpoint.Kill=1*error(sorry)" \
docker.io/library/alpine:latest testing sleep 1d
➜ ctr t ls
TASK PID STATUS
testing 147304 RUNNING
➜ ctr t kill -s SIGKILL testing
ctr: sorry: unknown
➜ ctr t kill -s SIGKILL testing
➜ sudo ctr t ls
TASK PID STATUS
testing 147304 STOPPED
```
The runc-fp.v1 shim is based on core runc.v2. We can use it to inject
failpoint during testing complicated or big transcation API, like
kubernetes PodRunPodsandbox.
Signed-off-by: Wei Fu <fuweid89@gmail.com>