Commit Graph

137 Commits

Author SHA1 Message Date
Derek McGowan
551516a18d
Merge pull request from GHSA-c9cp-9c75-9v8c
Fix the Inheritable capability defaults.
2022-03-23 10:50:56 -07:00
Henry Wang
b8bf504e94 Enable gosec linter for golangci-lint
`gosec` linter is able to identify issues described in #6584

e.g.

$ git revert 54e95e6b88
[gosec dfc8ca1ec] Revert "fix Implicit memory aliasing in for loop"
 2 files changed, 2 deletions(-)

$ make check
+ proto-fmt
+ check
GOGC=75 golangci-lint run
containerstore.go:192:54: G601: Implicit memory aliasing in for loop. (gosec)
		containers = append(containers, containerFromProto(&container))
		                                                   ^
image_store.go:132:42: G601: Implicit memory aliasing in for loop. (gosec)
		images = append(images, imageFromProto(&image))
		                                       ^
make: *** [check] Error 1

I also disabled following two settings which prevent the linter to show a complete list of issues.

* max-issues-per-linter (default 50)
* max-same-issues (default 3)

Furthermore enabling gosec revealed many other issues. For now I blacklisted the ones except G601.

Will create separate tasks to address them one by one moving next.

Signed-off-by: Henry Wang <henwang@amazon.com>
2022-03-14 22:50:54 +00:00
Paul "TBBle" Hampson
39d52118f5 Plumb CRI Devices through to OCI WindowsDevices
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID
-- hcsshim's CRI test suite generates this form

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-12 08:16:43 +11:00
Justin Terry
de3d9993f5 Adds support for Windows ArgsEscaped images
Adds support for Windows container images built by Docker
that contain the ArgsEscaped boolean in the ImageConfig. This
is a non-OCI entry that tells the runtime that the Entrypoint
and/or Cmd are a single element array with the args pre-escaped
into a single CommandLine that should be passed directly to
Windows rather than passed as an args array which will be
additionally escaped.

Signed-off-by: Justin Terry <jlterry@amazon.com>
2022-03-01 13:40:44 -08:00
Andrew G. Morgan
6906b57c72
Fix the Inheritable capability defaults.
The Linux kernel never sets the Inheritable capability flag to
anything other than empty. Non-empty values are always exclusively
set by userspace code.

[The kernel stopped defaulting this set of capability values to the
 full set in 2000 after a privilege escalation with Capabilities
 affecting Sendmail and others.]

Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
2022-02-01 13:55:46 -08:00
Wei Fu
813a061fe1 oci: use readonly mount to read user/group info
In linux kernel, the umount writable-mountpoint will try to do sync-fs
to make sure that the dirty pages to the underlying filesystems. The many
number of umount actions in the same time maybe introduce performance
issue in IOPS limited disk.

When CRI-plugin creates container, it will temp-mount rootfs to read
that UID/GID info for entrypoint. Basically, the rootfs is writable
snapshotter and then after read, umount will invoke sync-fs action.

For example, using overlayfs on ext4 and use bcc-tools to monitor
ext4_sync_fs call.

```
// uname -a
Linux chaofan 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

// open terminal 1
kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod

// open terminal 2
/usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P

  ext4_sync_fs
  sync_filesystem
  ovl_sync_fs
  __sync_filesystem
  sync_filesystem
  generic_shutdown_super
  kill_anon_super
  deactivate_locked_super
  deactivate_super
  cleanup_mnt
  __cleanup_mnt
  task_work_run
  exit_to_user_mode_prepare
  syscall_exit_to_user_mode
  do_syscall_64
  entry_SYSCALL_64_after_hwframe
  syscall.Syscall.abi0
  github.com/containerd/containerd/mount.unmount
  github.com/containerd/containerd/mount.UnmountAll
  github.com/containerd/containerd/mount.WithTempMount.func2
  github.com/containerd/containerd/mount.WithTempMount
  github.com/containerd/containerd/oci.WithUserID.func1
  github.com/containerd/containerd/oci.WithUser.func1
  github.com/containerd/containerd/oci.ApplyOpts
  github.com/containerd/containerd.WithSpec.func1
  github.com/containerd/containerd.(*Client).NewContainer
  github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
  github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
  github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
  google.golang.org/grpc.(*Server).processUnaryRPC
  google.golang.org/grpc.(*Server).handleStream
  google.golang.org/grpc.(*Server).serveStreams.func1.2
  runtime.goexit.abi0
    containerd [34771]
    1
```

If there are comming several create requestes, umount actions might
bring high IO pressure on the /var/lib/containerd's underlying disk.

After checkout the kernel code[1], the kernel will not call
__sync_filesystem if the mount is readonly. Based on this, containerd
should use readonly mount to get UID/GID information.

Reference:

* [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61

Closes: #4604

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-01-28 23:36:04 +08:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
Markus Lehtonen
2946db8903 oci: implement WithRdt
Helper for setting the (Linux) container's RDT parameters.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-01-04 09:27:54 +02:00
Derek McGowan
bf60e2d174
Merge pull request #6304 from lippertmarkus/fix-ctr-cni-windows
fix(ctr): enable networking for Windows containers
2022-01-03 10:47:44 -08:00
Markus Lippert
f39b3ac7ea fix(ctr): enable networking for Windows containers
Signed-off-by: Markus Lippert <lippertmarkus@gmx.de>
2021-12-21 07:13:39 +01:00
Michael Crosby
89437597ac
Merge pull request #6308 from thaJeztah/rootless_devices
OCI: Mount (accessible) host devices in privileged rootless containers
2021-12-14 15:21:31 -05:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Sebastiaan van Stijn
b7f673790f
OCI: Mount (accessible) host devices in privileged rootless containers
Allow rootless containers with privileged to mount devices that are accessible
(ignore permission errors in rootless mode).

This patch updates oci.getDevices() to ignore access denied errors on sub-
directories and files within the given path if the container is running with
userns enabled.

Note that these errors are _only_ ignored on paths _under_ the specified path,
and not the path itself, so if `HostDevices()` is used, and `/dev` itself is
not accessible, or `WithDevices()` is used to specify a device that is not
accessible, an error is still produced.

Tests were added, which includes a temporary workaround for compatibility
with Go 1.16 (we could decide to skip these tests on Go 1.16 instead).

To verify the patch in a container:

    docker run --rm -v $(pwd):/go/src/github.com/containerd/containerd -w /go/src/github.com/containerd/containerd golang:1.17 sh -c 'go test -v -run TestHostDevices ./oci'
    === RUN   TestHostDevicesOSReadDirFailure
    --- PASS: TestHostDevicesOSReadDirFailure (0.00s)
    === RUN   TestHostDevicesOSReadDirFailureInUserNS
    --- PASS: TestHostDevicesOSReadDirFailureInUserNS (0.00s)
    === RUN   TestHostDevicesDeviceFromPathFailure
    --- PASS: TestHostDevicesDeviceFromPathFailure (0.00s)
    === RUN   TestHostDevicesDeviceFromPathFailureInUserNS
    --- PASS: TestHostDevicesDeviceFromPathFailureInUserNS (0.00s)
    === RUN   TestHostDevicesAllValid
    --- PASS: TestHostDevicesAllValid (0.00s)
    PASS
    ok  	github.com/containerd/containerd/oci	0.006s

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-10 12:16:59 +01:00
Sebastiaan van Stijn
7f70ff9672
oci.getDevices(): move "non-dir, non '/dev/console'" case into switch
This makes it slightly clearer that these are all part of the same logic.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-10 12:16:57 +01:00
Maksym Pavlenko
441bcb56d7 Skip WithAdditionalGIDs on Darwin
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-12-06 14:26:43 -08:00
Sebastiaan van Stijn
f3195b3b51
export oci.DeviceFromPath()
This will help to reduce the amount of runc/libcontainer code that's used in
Moby / Docker Engine (in favor of using the containerd implementation).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-02 17:17:12 +01:00
Sebastiaan van Stijn
94462d8f5d
oci/deviceFromPath(): correctly check device types
This ports the changes of 95a59bf206
to this repository.

From that PR:

    (mode&S_IFCHR == S_IFCHR) is the wrong way of checking the type of an
    inode because the S_IF* bits are actually not a bitmask and instead must
    be checked using S_IF*. This bug was neatly hidden behind a (major == 0)
    sanity-check but that was removed by [1].

    In addition, add a test that makes sure that HostDevices() doesn't give
    rubbish results -- because we broke this and fixed this before[2].

    [1]: 24388be71e ("configs: use different types for .Devices and .Resources.Devices")
    [2]: 3ed492ad33 ("Handle non-devices correctly in DeviceFromPath")

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-01 13:21:34 +01:00
Derek McGowan
63b7e5771e
Merge pull request #5973 from Juneezee/deprecate-ioutil
refactor: move from io/ioutil to io and os package
2021-10-01 10:52:06 -07:00
Eng Zer Jun
50da673592
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-09-21 09:50:38 +08:00
Daniel Canter
50a5685951 Change oci.WithUser behavior for LCOW
The `oci.WithUser` function relies on checking a path on the hosts disk to
grab/validate the uid:gid pair for the user string provided. For LCOW it's a
bit harder to confirm that the user actually exists on the host as a rootfs isn't
mounted on the host and shared into the guest, but rather the rootfs is constructed
entirely in the guest itself. To accomodate this, a spot to place the user string
provided by a client as-is is needed.

The `Username` field on the runtime spec is marked by Platform as only for Windows,
and in this case it *is* being set on a Windows host at least, but will be used as a
temporary holding spot until the guest can use the string to perform these same
operations to grab the uid:gid inside.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2021-09-17 09:36:24 -07:00
Mike Brown
4bc5ca76e7
Merge pull request #5122 from mythi/non-root-containers-and-devices-v2
cri: Devices ownership from SecurityContext
2021-08-31 16:11:18 -05:00
Phil Estes
0d5223d700
Merge pull request #5845 from alexandref75/fix-dir-device-support
Fix dir support for devices
2021-08-30 13:59:13 -04:00
Alexandre Peixoto Ferreira
838afd2116 Adding testing of two devices
Signed-off-by: Alexandre Peixoto Ferreira <alexandref75@gmail.com>
2021-08-30 11:28:43 -05:00
Mikko Ylinen
e0f8c04dad cri: Devices ownership from SecurityContext
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.

This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.

It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.

This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-08-30 09:30:00 +03:00
Akihiro Suda
d3aa7ee9f0
Run go fmt with Go 1.17
The new `go fmt` adds `//go:build` lines (https://golang.org/doc/go1.17#tools).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-08-22 09:31:50 +09:00
Alexandre Peixoto Ferreira
7fba862648 Fix dir support for devices V3 (#4847)
Signed-off-by: Alexandre Peixoto Ferreira <alexandref75@gmail.com>
2021-08-19 07:18:31 -05:00
Fu Wei
1c4e9d0487
Merge pull request #5846 from crosbymichael/ctr-cpu.shares 2021-08-11 09:24:00 +08:00
Michael Crosby
d2f3b71468 add cpu-shares to ctr
This allows the cpu shares to be modified via ctr.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-08-10 04:12:15 -04:00
Derek McGowan
6f027e38a8
Remove redundant build tags
Remove build tags which are already implied by the name of the file.
Ensures build tags are used consistently

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-08-05 22:27:46 -07:00
Gijs Peskens
a4f97d45da Add ruleset=4 option
Signed-off-by: Gijs Peskens <gijs@peskens.net>
2021-05-25 09:17:16 +02:00
Gijs Peskens
1442fee229 Remove mountpoints not commonly mounted on FreeBSD
Signed-off-by: Gijs Peskens <gijs@peskens.net>
2021-05-13 21:36:53 +02:00
Gijs Peskens
de04b32430 Add copyright header & make sure compilation succeeds on all platforms
Signed-off-by: Gijs Peskens <gijs@peskens.net>
2021-05-10 21:49:46 +02:00
Gijs Peskens
e1fd6be7e8 Fix mounts for FreeBSD
Signed-off-by: Gijs Peskens <gijs@peskens.net>
2021-05-10 21:49:46 +02:00
Fu Wei
ab963e1cc1
Merge pull request #5063 from Iceber/fix-with-dev-shm-size
oci: fix WithDevShmSize
2021-05-06 23:48:04 +08:00
Sebastiaan van Stijn
9bc8d63c9f
cri/server: use containerd/oci instead of libcontainer/devices
Looks like we had our own copy of the "getDevices" code already, so use
that code (which also matches the code that's used to _generate_ the spec,
so a better match).

Moving the code to a separate file, I also noticed that the _unix and _linux
code was _exactly_ the same (baring some `//nolint:` comments), so also
removing the duplicated code.

With this patch applied, we removed the dependency on the libcontainer/devices
package (leaving only libcontainer/user).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-08 23:25:21 +02:00
Akihiro Suda
8ba8533bde
pkg/cri/opts.WithoutRunMount -> oci.WithoutRunMount
Move `pkg/cri/opts.WithoutRunMount` function to `oci.WithoutRunMount`
so that it can be used without dependency on CRI.

Also add `oci.WithoutMounts(dests ...string)` for generality.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-07 21:25:36 +09:00
Iceber Gu
b592a4c1ec
oci: fix WithDevShmSize
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-03-17 13:43:58 +08:00
Brian Goff
7776e5ef2a Support adding devices by dir
This enables cases where devices exist in a subdirectory of /dev,
particularly where those device names are not portable across machines,
which makes it problematic to specify from a runtime such as cri.

Added this to `ctr` as well so I could test that the code at least
works.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-03-15 16:42:23 +00:00
Derek McGowan
35eeb24a17
Fix exported comments enforcer in CI
Add comments where missing and fix incorrect comments

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-03-12 08:47:05 -08:00
Iceber Gu
f7f6aabfff oci: fix superfluous slice operations
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-02-26 02:35:25 +08:00
Phil Estes
757be0a090
Merge pull request #5017 from AkihiroSuda/parse-cap
oci.WithPrivileged: set the current caps, not the known caps
2021-02-23 09:10:57 -05:00
Akihiro Suda
51f985cb1f
oci: move cap UT to _linux_test.go
No substantial code change

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-10 17:42:18 +09:00
Iceber Gu
d08aa4b681
oci: fix the file mode of the device
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-02-10 16:37:12 +08:00
Akihiro Suda
a2d1a8a865
oci.WithPrivileged: set the current caps, not the known caps
This change is needed for running the latest containerd inside Docker
that is not aware of the recently added caps (BPF, PERFMON, CHECKPOINT_RESTORE).

Without this change, containerd inside Docker fails to run containers with
"apply caps: operation not permitted" error.

See kubernetes-sigs/kind 2058

NOTE: The caller process of this function is now assumed to be as
privileged as possible.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-10 17:14:17 +09:00
Akihiro Suda
8a57d70a50
oci: expose getUserFromPath and getGIDFromPath
These functions are planned to be used for implementing `nerdctl exec --user`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-12-14 21:53:17 +09:00
Michael Crosby
02afa94256 Add --cpus flag to ctr
Signed-off-by: Michael Crosby <michael@thepasture.io>
2020-07-28 23:06:07 -04:00
Paul "TBBle" Hampson
ca15cb0d81 Fix incorrect (cut-and-paste) method comment
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2020-07-21 19:43:08 +10:00
Kir Kolyshkin
6e638ad27a Nit: fix use of bufio.Scanner.Err
The Err() method should be called after the Scan() loop, not inside it.

Found by: git grep -A3 -F '.Scan()'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-11 19:36:21 -07:00
Michael Crosby
a3ca8a0dfc Add linux resource oci.SpecOpts
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2020-03-05 13:02:10 -05:00
Jie Hao Liao
51a6813c06 Split uid and gid user ns remapping in oci
Signed-off-by: Jie Hao Liao <liaojh1998@gmail.com>
2019-12-10 03:07:50 -06:00