Commit Graph

57 Commits

Author SHA1 Message Date
Samuel Karp
52afa34f52
cri: update WithoutDefaultSecuritySettings comment
This pointer to an issue never got updated after the CRI plugin was
absorbed into the main containerd repo as an in-tree plugin.

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-05-07 15:22:35 -07:00
Maksym Pavlenko
6f34da5f80 Cleanup logrus imports
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-05-05 11:54:14 -07:00
Rodrigo Campos
7e6ab84884 cri: Throw an error if idmap mounts is requested
We need support in containerd and the OCI runtime to use idmap mounts.
Let's just throw an error for now if the kubelet requests some mounts
with mappings.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-04-11 21:31:12 +02:00
Paul "TBBle" Hampson
474a257b16 Implement Windows mounting for bind and windows-layer mounts
Using symlinks for bind mounts means we are not protecting an RO-mounted
layer against modification. Windows doesn't currently appear to offer a
better approach though, as we cannot create arbitrary empty WCOW scratch
layers at this time.

For windows-layer mounts, Unmount does not have access to the mounts
used to create it. So we store the relevant data in an Alternate Data
Stream on the mountpoint in order to be able to Unmount later.

Based on approach in https://github.com/containerd/containerd/pull/2366,
with sign-offs recorded as 'Based-on-work-by' trailers below.

This also partially-reverts some changes made in #6034 as they are not
needed with this mounting implmentation, which no longer needs to be
handled specially by the caller compared to non-Windows mounts.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Based-on-work-by: Michael Crosby <crosbymichael@gmail.com>
Based-on-work-by: Darren Stahl <darst@microsoft.com>
2023-03-31 06:15:17 -07:00
June Rhodes
f48ae22273
fix: Update error message format based on feedback
Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
2023-03-17 06:49:12 +11:00
June Rhodes
3193650f13
fix: 'failed to resolve symlink' error messaging
This error message currently does not provide useful information, because the `src` value that is interleaved will have been overridden by the call to `osi.ResolveSymbolicLink`. This stores the original `src` before the `osi.ResolveSymbolicLink` call so the error message can be useful.

Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
2023-03-17 05:12:43 +11:00
Maksym Pavlenko
07c2ae12e1 Remove v1 runctypes
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-03-15 09:18:16 -07:00
Akihiro Suda
6d95132313
go.mod: github.com/containerd/cgroups/v3 v3.0.1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-07 22:06:38 +09:00
Kirtana Ashok
8137e41c48 Add ArgsEscaped support for CRI
This commit adds supports for the ArgsEscaped
value for the image got from the dockerfile.
It is used to evaluate and process the image
entrypoint/cmd and container entrypoint/cmd
options got from the podspec.

Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
2023-03-03 13:38:06 -08:00
Maksym Pavlenko
1ade777c24 Add basic spec and mounts for Darwin
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-01-12 17:00:40 -08:00
Maksym Pavlenko
40be96efa9 Have separate spec builder for each platform
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-01-11 13:12:25 -08:00
Maksym Pavlenko
dd22a3a806 Move WithMounts to specs
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-01-11 13:03:59 -08:00
Maksym Pavlenko
0ae0399b16 Make OCI spec opts available on all platforms
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-01-11 13:03:58 -08:00
Qasim Sarfraz
9c8c4508ec cri: Fix TestUpdateOCILinuxResource for host w/o swap controller
Tested on Ubuntu 20.04 w/o swap controller:
```
$ stat -fc %T /sys/fs/cgroup/
tmpfs
$ la -la /sys/fs/cgroup/memory/memory.memsw.limit_in_bytes
ls: cannot access '/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes': No such file or directory
$  go test -v ./pkg/cri/sbserver/ -run TestUpdateOCILinuxResource
=== RUN   TestUpdateOCILinuxResource
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_patch_the_unified_map
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_update_each_resource
=== RUN   TestUpdateOCILinuxResource/should_skip_empty_fields
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_fill_empty_fields
--- PASS: TestUpdateOCILinuxResource (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_patch_the_unified_map (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_update_each_resource (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_skip_empty_fields (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_fill_empty_fields (0.00s)
PASS
ok      github.com/containerd/containerd/pkg/cri/sbserver       (cached)
$ go test -v ./pkg/cri/server/ -run TestUpdateOCILinuxResource
=== RUN   TestUpdateOCILinuxResource
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_update_each_resource
=== RUN   TestUpdateOCILinuxResource/should_skip_empty_fields
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_fill_empty_fields
=== RUN   TestUpdateOCILinuxResource/should_be_able_to_patch_the_unified_map
--- PASS: TestUpdateOCILinuxResource (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_update_each_resource (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_skip_empty_fields (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_fill_empty_fields (0.00s)
    --- PASS: TestUpdateOCILinuxResource/should_be_able_to_patch_the_unified_map (0.00s)
PASS
ok      github.com/containerd/containerd/pkg/cri/server (cached)
```

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
2023-01-10 15:41:04 +01:00
Rodrigo Campos
a7adeb6976 cri: Support pods with user namespaces
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-12-21 17:56:56 -03:00
Akihiro Suda
4157503881
cri: fix memory.memsw.limit_in_bytes: no such file or directory
Skip automatic `if swapLimit == 0 { s.Linux.Resources.Memory.Swap = &limit }` when the swap controller is missing.
(default on Ubuntu 20.04)

Fix issue 7828 (regression in PR 7783 "cri: make swapping disabled with memory limit")

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-12-19 11:28:07 +09:00
Qasim Sarfraz
69975b92bb cri: make swapping disabled with memory limit
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
2022-12-08 13:54:55 +01:00
Sebastiaan van Stijn
eaedadbed0
replace strings.Split(N) for strings.Cut() or alternatives
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:

```go
func BenchmarkSplit(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _ = strings.SplitN(s, "=", 2)[0]
                }
        }
}

func BenchmarkCut(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _, _, _ = strings.Cut(s, "=")
                }
        }
}
```

    BenchmarkSplit
    BenchmarkSplit-10            8244206               128.0 ns/op           128 B/op          4 allocs/op
    BenchmarkCut
    BenchmarkCut-10             54411998                21.80 ns/op            0 B/op          0 allocs/op

While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-07 10:02:25 +01:00
Maksym Pavlenko
908be16858
Merge pull request #7577 from dcantah/maintenance-cri-winns 2022-10-23 14:32:02 -07:00
Danny Canter
9a0331c477 maintenance: Remove WithWindowsNetworkNamespace from pkg/cri
Old TODO stating that pkg/cri/opts's `WithWindowsNetworkNamespace`
should be moved to the main containerd pkg was out of date as thats
already been done (well, to the /oci package). This just removes it
and swaps all uses of `WithWindowsNetworkNamespace` to the oci
packages impl.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2022-10-23 06:45:32 -07:00
Fu Wei
9b54eee718
Merge pull request #7419 from bart0sh/PR005-configure-CDI-registry-on-start 2022-10-22 08:17:33 +08:00
Sebastiaan van Stijn
29c7fc9520
clean-up "nolint" comments, remove unused ones
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
  no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-12 14:40:59 +02:00
Ed Bartosh
643dc16565 improve CDI logging
Added logging of found CDI devices.
Fixed test failures caused by the change.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Ed Bartosh
8ed910c46a CDI: configure registry on start
Currently CDI registry is reconfigured on every
WithCDI call, which is a relatively heavy operation.

This happens because cdi.GetRegistry(cdi.WithSpecDirs(cdiSpecDirs...))
unconditionally reconfigures the registry (clears fs notify watch,
sets up new watch, rescans directories).

Moving configuration to the criService.initPlatform should result
in performing registry configuration only once on the service start.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Ed Bartosh
eec7a76ecd move WithCDI to pkg/cri/opts
As WithCDI is CRI-only API it makes sense to move it
out of oci module.

This move can also fix possible issues with this API when
CRI plugin is disabled.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Maksym Pavlenko
ca3b9b50fe Run gofmt 1.19
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-08-04 18:18:33 -07:00
Kang.Zhang
fceab7f4c4 remove duplicate
Signed-off-by: Kang.Zhang <Kang.zhang@intel.com>
2022-04-26 10:44:45 +08:00
Andrey Klimentyev
5f3ce9512b Do not append []string{""} to command to preserve Docker compatibility
Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
2022-04-13 13:29:49 +03:00
Amit Barve
bfde58e3cd Bug fix for mount path handling
Currently when handling 'container_path' elements in container mounts we simply call
filepath.Clean on those paths. However, filepath.Clean adds an extra '.' if the path is a
simple drive letter ('E:' or 'Z:' etc.). These type of paths cause failures (with incorrect
parameter error) when creating containers via hcsshim. This commit checks for such paths
and doesn't call filepath.Clean on them.
It also adds a new check to error out if the destination path is a C drive and moves the
dst path checks out of the named pipe condition.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
2022-03-21 09:40:19 -07:00
Fu Wei
d9797673b0
Merge pull request #6593 from qiutongs/improve-container-mount
Make the temp mount as ready only in container WithVolumes
2022-03-18 00:03:28 +08:00
Paul "TBBle" Hampson
39d52118f5 Plumb CRI Devices through to OCI WindowsDevices
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID
-- hcsshim's CRI test suite generates this form

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-12 08:16:43 +11:00
Qiutong Song
ec90efbe99 Make the temp mount as ready only in container WithVolumes
Signed-off-by: Qiutong Song <songqt01@gmail.com>
2022-02-25 17:53:30 -08:00
ruiwen-zhao
fb0b8d6177 Use fs.RootPath when mounting volumes
Signed-off-by: Ruiwen Zhao <ruiwen@google.com>
2022-02-17 19:20:00 +00:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
Derek McGowan
644a01e13b
Merge pull request from GHSA-mvff-h3cj-wj9c
only relabel cri managed host mounts
2022-01-05 09:30:58 -08:00
Phil Estes
949db57213
Merge pull request #6320 from endocrimes/dani/cri-swap
cri: add support for configuring swap
2021-12-14 15:02:28 -05:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Michael Crosby
9b0303913f
only relabel cri managed host mounts
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Samuel Karp <skarp@amazon.com>
2021-12-09 09:53:47 -08:00
Danielle Lancashire
2fa4e9c0e2 cri: add support for configuring swap
Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
2021-12-02 21:25:33 +01:00
Claudiu Belu
2bc77b8a28 Adds Windows resource limits support
This will allow running Windows Containers to have their resource
limits updated through containerd. The CPU resource limits support
has been added for Windows Server 20H2 and newer, on older versions
hcsshim will raise an Unimplemented error.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-09-25 13:20:55 -07:00
Michael Crosby
7b8a697f28
Merge pull request #6034 from claudiubelu/windows/fixes-image-volume
Fixes Windows containers with image volumes
2021-10-07 11:50:01 -04:00
Claudiu Belu
791e175c79 Windows: Fixes Windows containers with image volumes
Currently, there are few issues that preventing containers
with image volumes to properly start on Windows.

- Unlike the Linux implementation, the Container volume mount paths
  were not created if they didn't exist. Those paths are now created.

- while copying the image volume contents to the container volume,
  the layers were not properly deactivated, which means that the
  container can't start since those layers are still open. The layers
  are now properly deactivated, allowing the container to start.

- even if the above issue didn't exist, the Windows implementation of
  mount/Mount.Mount deactivates the layers, which wouldn't allow us
  to copy files from them. The layers are now deactivated after we've
  copied the necessary files from them.

- the target argument of the Windows implementation of mount/Mount.Mount
  was unused, which means that folder was always empty. We're now
  symlinking the Layer Mount Path into the target folder.

- hcsshim needs its Container Mount Paths to be properly formated, to be
  prefixed by C:. This was an issue for Volumes defined with Linux-like
  paths (e.g.: /test_dir). filepath.Abs solves this issue.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-10-01 09:02:18 +00:00
Eng Zer Jun
50da673592
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-09-21 09:50:38 +08:00
Fu Wei
d58542a9d1
Merge pull request #5627 from payall4u/payall4u/cri-support-cgroup-v2 2021-09-09 23:10:33 +08:00
Mikko Ylinen
e0f8c04dad cri: Devices ownership from SecurityContext
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.

This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.

It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.

This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-08-30 09:30:00 +03:00
payall4u
9a8bf13158 feature: add field LinuxContainerResources.Unified on cri
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-08-23 10:49:31 +08:00
Jacob Blain Christen
c3609ff4ca cri: filter selinux xattr for image volumes
Exclude the `security.selinux` xattr when copying content from layer
storage for image volumes. This allows for the already correct label
at the target location to be applied to the copied content, thus
enabling containers to write to volumes that they implicitly expect to be
able to write to.

- Fixes containerd/containerd#5090
- See rancher/rke2#690

Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2021-08-20 23:47:24 -07:00
Derek McGowan
6f027e38a8
Remove redundant build tags
Remove build tags which are already implied by the name of the file.
Ensures build tags are used consistently

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-08-05 22:27:46 -07:00
Mike Brown
a5c417ac06 move up to CRI v1 and support v1alpha in parallel
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-06-28 09:34:12 -05:00
Thomas Hartland
b48f27df6b Support PID NamespaceMode_TARGET
This commit adds support for the PID namespace mode TARGET
when generating a container spec.

The container that is created will be sharing its PID namespace
with the target container that was specified by ID in the namespace
options.

Signed-off-by: Thomas Hartland <thomas.george.hartland@cern.ch>
2021-04-21 17:54:17 +02:00