Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change modifies WithLinuxDevice to take an option `followSymlink`
and be unexported as `withLinuxDevice`. An option
`WithLinuxDeviceFollowSymlinks` will call this unexported option to
follow a symlink, which will resolve a symlink before calling
`DeviceFromPath`. `WithLinuxDevice` has been changed to call
`withLinuxDevice` without following symlinks.
Signed-off-by: Gavin Inglis <giinglis@amazon.com>
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As WithCDI is CRI-only API it makes sense to move it
out of oci module.
This move can also fix possible issues with this API when
CRI plugin is disabled.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
ArgsEscaped has now been merged into upstream OCI image spec.
This change removes the workaround we were doing in containerd
to deserialize the extra json outside of the spec and instead
just uses the formal spec types.
Signed-off-by: Justin Terry <jlterry@amazon.com>
Mount options are marked `json:omitempty`. An empty slice in the default
object caused TestWithSpecFromFile to fail.
Signed-off-by: Samuel Karp <me@samuelkarp.com>
This allows running Linux containers on FreeBSD and modifies the
mounts so that they represent the linux emulated filesystems, as per:
https://wiki.freebsd.org/LinuxJails
Co-authored-by: Gijs Peskens <gijs@peskens.net>, Samuel Karp <samuelkarp@users.noreply.github.com>
Signed-off-by: Artem Khramov <akhramov@pm.me>
A container should not have access to tun/tap device, unless it is explicitly
specified in configuration.
This device was already removed from docker's default, and runc's default;
- 2ce40b6ad7
- 9c4570a958
Per the commit message in runc, this should also fix these messages;
> Apr 26 03:46:56 foo.bar systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory
coming from systemd on every container start, when the systemd cgroup driver
is used, and the system runs an old (< v240) version of systemd
(the message was presumably eliminated by [1]).
[1]: d5aecba6e0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This patch adds support for a container annotation and two separate
pod annotations for controlling the blockio class of containers.
The container annotation can be used by a CRI client:
"io.kubernetes.cri.blockio-class"
Pod annotations specify the blockio class in the K8s pod spec level:
"blockio.resources.beta.kubernetes.io/pod"
(pod-wide default for all containers within)
"blockio.resources.beta.kubernetes.io/container.<container_name>"
(container-specific overrides)
Correspondingly, this patch adds support for --blockio-class and
--blockio-config-file to ctr, too.
This implementation follows the resource class annotation pattern
introduced in RDT and merged in commit 893701220.
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
`gosec` linter is able to identify issues described in #6584
e.g.
$ git revert 54e95e6b88
[gosec dfc8ca1ec] Revert "fix Implicit memory aliasing in for loop"
2 files changed, 2 deletions(-)
$ make check
+ proto-fmt
+ check
GOGC=75 golangci-lint run
containerstore.go:192:54: G601: Implicit memory aliasing in for loop. (gosec)
containers = append(containers, containerFromProto(&container))
^
image_store.go:132:42: G601: Implicit memory aliasing in for loop. (gosec)
images = append(images, imageFromProto(&image))
^
make: *** [check] Error 1
I also disabled following two settings which prevent the linter to show a complete list of issues.
* max-issues-per-linter (default 50)
* max-same-issues (default 3)
Furthermore enabling gosec revealed many other issues. For now I blacklisted the ones except G601.
Will create separate tasks to address them one by one moving next.
Signed-off-by: Henry Wang <henwang@amazon.com>
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID
-- hcsshim's CRI test suite generates this form
`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Adds support for Windows container images built by Docker
that contain the ArgsEscaped boolean in the ImageConfig. This
is a non-OCI entry that tells the runtime that the Entrypoint
and/or Cmd are a single element array with the args pre-escaped
into a single CommandLine that should be passed directly to
Windows rather than passed as an args array which will be
additionally escaped.
Signed-off-by: Justin Terry <jlterry@amazon.com>
The Linux kernel never sets the Inheritable capability flag to
anything other than empty. Non-empty values are always exclusively
set by userspace code.
[The kernel stopped defaulting this set of capability values to the
full set in 2000 after a privilege escalation with Capabilities
affecting Sendmail and others.]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
In linux kernel, the umount writable-mountpoint will try to do sync-fs
to make sure that the dirty pages to the underlying filesystems. The many
number of umount actions in the same time maybe introduce performance
issue in IOPS limited disk.
When CRI-plugin creates container, it will temp-mount rootfs to read
that UID/GID info for entrypoint. Basically, the rootfs is writable
snapshotter and then after read, umount will invoke sync-fs action.
For example, using overlayfs on ext4 and use bcc-tools to monitor
ext4_sync_fs call.
```
// uname -a
Linux chaofan 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
// open terminal 1
kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod
// open terminal 2
/usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P
ext4_sync_fs
sync_filesystem
ovl_sync_fs
__sync_filesystem
sync_filesystem
generic_shutdown_super
kill_anon_super
deactivate_locked_super
deactivate_super
cleanup_mnt
__cleanup_mnt
task_work_run
exit_to_user_mode_prepare
syscall_exit_to_user_mode
do_syscall_64
entry_SYSCALL_64_after_hwframe
syscall.Syscall.abi0
github.com/containerd/containerd/mount.unmount
github.com/containerd/containerd/mount.UnmountAll
github.com/containerd/containerd/mount.WithTempMount.func2
github.com/containerd/containerd/mount.WithTempMount
github.com/containerd/containerd/oci.WithUserID.func1
github.com/containerd/containerd/oci.WithUser.func1
github.com/containerd/containerd/oci.ApplyOpts
github.com/containerd/containerd.WithSpec.func1
github.com/containerd/containerd.(*Client).NewContainer
github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/grpc.(*Server).handleStream
google.golang.org/grpc.(*Server).serveStreams.func1.2
runtime.goexit.abi0
containerd [34771]
1
```
If there are comming several create requestes, umount actions might
bring high IO pressure on the /var/lib/containerd's underlying disk.
After checkout the kernel code[1], the kernel will not call
__sync_filesystem if the mount is readonly. Based on this, containerd
should use readonly mount to get UID/GID information.
Reference:
* [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61Closes: #4604
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Allow rootless containers with privileged to mount devices that are accessible
(ignore permission errors in rootless mode).
This patch updates oci.getDevices() to ignore access denied errors on sub-
directories and files within the given path if the container is running with
userns enabled.
Note that these errors are _only_ ignored on paths _under_ the specified path,
and not the path itself, so if `HostDevices()` is used, and `/dev` itself is
not accessible, or `WithDevices()` is used to specify a device that is not
accessible, an error is still produced.
Tests were added, which includes a temporary workaround for compatibility
with Go 1.16 (we could decide to skip these tests on Go 1.16 instead).
To verify the patch in a container:
docker run --rm -v $(pwd):/go/src/github.com/containerd/containerd -w /go/src/github.com/containerd/containerd golang:1.17 sh -c 'go test -v -run TestHostDevices ./oci'
=== RUN TestHostDevicesOSReadDirFailure
--- PASS: TestHostDevicesOSReadDirFailure (0.00s)
=== RUN TestHostDevicesOSReadDirFailureInUserNS
--- PASS: TestHostDevicesOSReadDirFailureInUserNS (0.00s)
=== RUN TestHostDevicesDeviceFromPathFailure
--- PASS: TestHostDevicesDeviceFromPathFailure (0.00s)
=== RUN TestHostDevicesDeviceFromPathFailureInUserNS
--- PASS: TestHostDevicesDeviceFromPathFailureInUserNS (0.00s)
=== RUN TestHostDevicesAllValid
--- PASS: TestHostDevicesAllValid (0.00s)
PASS
ok github.com/containerd/containerd/oci 0.006s
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This will help to reduce the amount of runc/libcontainer code that's used in
Moby / Docker Engine (in favor of using the containerd implementation).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This ports the changes of 95a59bf206
to this repository.
From that PR:
(mode&S_IFCHR == S_IFCHR) is the wrong way of checking the type of an
inode because the S_IF* bits are actually not a bitmask and instead must
be checked using S_IF*. This bug was neatly hidden behind a (major == 0)
sanity-check but that was removed by [1].
In addition, add a test that makes sure that HostDevices() doesn't give
rubbish results -- because we broke this and fixed this before[2].
[1]: 24388be71e ("configs: use different types for .Devices and .Resources.Devices")
[2]: 3ed492ad33 ("Handle non-devices correctly in DeviceFromPath")
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The `oci.WithUser` function relies on checking a path on the hosts disk to
grab/validate the uid:gid pair for the user string provided. For LCOW it's a
bit harder to confirm that the user actually exists on the host as a rootfs isn't
mounted on the host and shared into the guest, but rather the rootfs is constructed
entirely in the guest itself. To accomodate this, a spot to place the user string
provided by a client as-is is needed.
The `Username` field on the runtime spec is marked by Platform as only for Windows,
and in this case it *is* being set on a Windows host at least, but will be used as a
temporary holding spot until the guest can use the string to perform these same
operations to grab the uid:gid inside.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>