Commit Graph

47 Commits

Author SHA1 Message Date
Shingo Omura
50740a1a0c
use strings.Cut instead of strings.Split for parsing imageConfig.User
Signed-off-by: Shingo Omura <everpeace@gmail.com>
2023-03-14 13:52:03 +09:00
Shingo Omura
727b254039
fix userstr for dditionalGids on Linux
It should fallback to imageConfig.User when no securityContext.RunAsUser/RunAsUsername

Signed-off-by: Shingo Omura <everpeace@gmail.com>
2023-02-19 22:09:00 +09:00
Derek McGowan
aa6418fadd
Merge pull request from GHSA-hmfx-3pcx-653p
oci: fix additional GIDs
2023-02-15 13:45:14 -08:00
Danny Canter
646bc3a94e CRI: Create DefaultCRIAnnotations helper
All of the CRI sandbox and container specs all get assigned
almost the exact same default annotations (sandboxID, name, metadata,
container type etc.) so lets make a helper to return the right set for
a sandbox or regular workload container.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-02-13 13:05:01 -08:00
Akihiro Suda
3eda46af12
oci: fix additional GIDs
Test suite:
```yaml

---
apiVersion: v1
kind: Pod
metadata:
  name: test-no-option
  annotations:
    description: "Equivalent of `docker run` (no option)"
spec:
  restartPolicy: Never
  containers:
    - name: main
      image: ghcr.io/containerd/busybox:1.28
      args: ['sh', '-euxc',
             '[ "$(id)" = "uid=0(root) gid=0(root) groups=0(root),10(wheel)" ]']
---
apiVersion: v1
kind: Pod
metadata:
  name: test-group-add-1-group-add-1234
  annotations:
    description: "Equivalent of `docker run --group-add 1 --group-add 1234`"
spec:
  restartPolicy: Never
  containers:
    - name: main
      image: ghcr.io/containerd/busybox:1.28
      args: ['sh', '-euxc',
             '[ "$(id)" = "uid=0(root) gid=0(root) groups=0(root),1(daemon),10(wheel),1234" ]']
  securityContext:
    supplementalGroups: [1, 1234]
---
apiVersion: v1
kind: Pod
metadata:
  name: test-user-1234
  annotations:
    description: "Equivalent of `docker run --user 1234`"
spec:
  restartPolicy: Never
  containers:
    - name: main
      image: ghcr.io/containerd/busybox:1.28
      args: ['sh', '-euxc',
             '[ "$(id)" = "uid=1234 gid=0(root) groups=0(root)" ]']
  securityContext:
    runAsUser: 1234
---
apiVersion: v1
kind: Pod
metadata:
  name: test-user-1234-1234
  annotations:
    description: "Equivalent of `docker run --user 1234:1234`"
spec:
  restartPolicy: Never
  containers:
    - name: main
      image: ghcr.io/containerd/busybox:1.28
      args: ['sh', '-euxc',
             '[ "$(id)" = "uid=1234 gid=1234 groups=1234" ]']
  securityContext:
    runAsUser: 1234
    runAsGroup: 1234
---
apiVersion: v1
kind: Pod
metadata:
  name: test-user-1234-group-add-1234
  annotations:
    description: "Equivalent of `docker run --user 1234 --group-add 1234`"
spec:
  restartPolicy: Never
  containers:
    - name: main
      image: ghcr.io/containerd/busybox:1.28
      args: ['sh', '-euxc',
             '[ "$(id)" = "uid=1234 gid=0(root) groups=0(root),1234" ]']
  securityContext:
    runAsUser: 1234
    supplementalGroups: [1234]
```

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-02-10 15:53:00 +09:00
Wei Fu
62df35df66 *: introduce wrapper pkgs for blockio and rdt
Before this patch, both the RdtEnabled and BlockIOEnabled are provided
by services/tasks pkg. Since the services/tasks can be pkg plugin which
can be initialized multiple times or concurrently. It will fire data-race
issue as there is no mutex to protect `enable`.

This patch is aimed to provide wrapper pkgs to use intel/{blockio,rdt}
safely.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-02-10 08:21:34 +08:00
Rodrigo Campos
4eed20fc31 cri: Verify userns container config is consisten with sandbox
The sandbox and container both have the userns config. Lets make sure
they are the same, therefore consistent.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-12-30 15:07:54 -03:00
Rodrigo Campos
a7adeb6976 cri: Support pods with user namespaces
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2022-12-21 17:56:56 -03:00
David Leadbeater
31a6449734 Add capability for snapshotters to declare support for UID remapping
This allows user namespace support to progress, either by allowing
snapshotters to deal with ownership, or falling back to containerd doing
a recursive chown.

In the future, when snapshotters implement idmap mounts, they should
report the "remap-ids" capability.

Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: David Leadbeater <dgl@dgl.cx>
2022-12-21 15:08:28 -03:00
Qasim Sarfraz
0c4d32c131 cri: add pod uid annotation
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
2022-11-19 01:12:02 +01:00
Kazuyoshi Kato
6596a70861 Use github.com/containerd/cgroups/v3 to remove gogo
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-11-14 21:07:48 +00:00
Ed Bartosh
8ed910c46a CDI: configure registry on start
Currently CDI registry is reconfigured on every
WithCDI call, which is a relatively heavy operation.

This happens because cdi.GetRegistry(cdi.WithSpecDirs(cdiSpecDirs...))
unconditionally reconfigures the registry (clears fs notify watch,
sets up new watch, rescans directories).

Moving configuration to the criService.initPlatform should result
in performing registry configuration only once on the service start.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Ed Bartosh
eec7a76ecd move WithCDI to pkg/cri/opts
As WithCDI is CRI-only API it makes sense to move it
out of oci module.

This move can also fix possible issues with this API when
CRI plugin is disabled.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
zounengren
d121efc6d8 replace with selinux label
Signed-off-by: zounengren <zouyee1989@gmail.com>
2022-07-24 20:11:16 +08:00
Daniel Canter
44e12dc5d8 Windows snapshotter touch ups and new functionality
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.

Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).

With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.

So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.

- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.

- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.

- Add test to cri integration suite to validate expanding the rootfs size.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-06-06 14:57:07 -07:00
Mike Brown
6b35307594
Merge pull request #5490 from askervin/5Bu_blockio
Support for cgroups blockio
2022-04-29 10:07:56 -05:00
Antti Kervinen
10576c298e cri: support blockio class in pod and container annotations
This patch adds support for a container annotation and two separate
pod annotations for controlling the blockio class of containers.

The container annotation can be used by a CRI client:
  "io.kubernetes.cri.blockio-class"

Pod annotations specify the blockio class in the K8s pod spec level:
  "blockio.resources.beta.kubernetes.io/pod"
  (pod-wide default for all containers within)

  "blockio.resources.beta.kubernetes.io/container.<container_name>"
  (container-specific overrides)

Correspondingly, this patch adds support for --blockio-class and
--blockio-config-file to ctr, too.

This implementation follows the resource class annotation pattern
introduced in RDT and merged in commit 893701220.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2022-04-29 11:44:09 +03:00
Kazuyoshi Kato
f140400c0e
Merge pull request #5686 from dtnyn/issue-5679
Add flag to allow oci.WithAllDevicesAllowed on PrivilegedWithoutHostDevices
2022-04-25 11:44:01 -07:00
Ed Bartosh
ff5c55847a move CDI calls to the linux-only code
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-04-06 13:10:59 +03:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
Derek McGowan
644a01e13b
Merge pull request from GHSA-mvff-h3cj-wj9c
only relabel cri managed host mounts
2022-01-05 09:30:58 -08:00
Markus Lehtonen
9c2e3835fa cri: add ignore_rdt_not_enabled_errors config option
Enabling this option effectively causes RDT class of a container to be a
soft requirement. If RDT support has not been enabled the RDT class
setting will not have any effect.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-01-04 09:27:54 +02:00
Markus Lehtonen
f4a191917b cri: annotations for controlling RDT class
Use goresctrl for parsing container and pod annotations related to RDT.

In practice, from the users' point of view, this patchs adds support for
a container annotation and two separate pod annotations for controlling
the RDT class of containers.

Container annotation can be used by a CRI client:
  "io.kubernetes.cri.rdt-class"

Pod annotations for specifying the RDT class in the K8s pod spec level:
  "rdt.resources.beta.kubernetes.io/pod"
  (pod-wide default for all containers within)

  "rdt.resources.beta.kubernetes.io/container.<container_name>"
  (container-specific overrides)

Annotations are intended as an intermediate step before the CRI API
supports RDT.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-01-04 09:27:54 +02:00
Michael Crosby
9b0303913f
only relabel cri managed host mounts
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Samuel Karp <skarp@amazon.com>
2021-12-09 09:53:47 -08:00
Dat Nguyen
afe39bebfe add oci.WithAllDevicesAllowed flag for privileged_without_host_devices
This commit adds a flag that enable all devices whitelisting when
privileged_without_host_devices is already enabled.

Fixes #5679

Signed-off-by: Dat Nguyen <dnguyen7@atlassian.com>
2021-11-04 10:24:19 +11:00
scuzhanglei
756f4a3147 cri: add devices for privileged container
Signed-off-by: scuzhanglei <greatzhanglei@gmail.com>
2021-09-10 10:16:26 +08:00
Mikko Ylinen
e0f8c04dad cri: Devices ownership from SecurityContext
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.

This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.

It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.

This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-08-30 09:30:00 +03:00
Mike Brown
a5c417ac06 move up to CRI v1 and support v1alpha in parallel
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-06-28 09:34:12 -05:00
Thomas Hartland
b48f27df6b Support PID NamespaceMode_TARGET
This commit adds support for the PID namespace mode TARGET
when generating a container spec.

The container that is created will be sharing its PID namespace
with the target container that was specified by ID in the namespace
options.

Signed-off-by: Thomas Hartland <thomas.george.hartland@cern.ch>
2021-04-21 17:54:17 +02:00
Akihiro Suda
8ba8533bde
pkg/cri/opts.WithoutRunMount -> oci.WithoutRunMount
Move `pkg/cri/opts.WithoutRunMount` function to `oci.WithoutRunMount`
so that it can be used without dependency on CRI.

Also add `oci.WithoutMounts(dests ...string)` for generality.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-07 21:25:36 +09:00
Yohei Ueda
07f1df4541
cri: set default masked/readonly paths to empty paths
Fixes #5029.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
2021-02-24 23:50:40 +09:00
Phil Estes
757be0a090
Merge pull request #5017 from AkihiroSuda/parse-cap
oci.WithPrivileged: set the current caps, not the known caps
2021-02-23 09:10:57 -05:00
zhangyadong.0808
08318b1ab9 cri: append envs from image config to empty slice to avoid env lost
Signed-off-by: Yadong Zhang <yadzhang@gmail.com>
2021-02-18 11:37:41 +08:00
Akihiro Suda
a2d1a8a865
oci.WithPrivileged: set the current caps, not the known caps
This change is needed for running the latest containerd inside Docker
that is not aware of the recently added caps (BPF, PERFMON, CHECKPOINT_RESTORE).

Without this change, containerd inside Docker fails to run containers with
"apply caps: operation not permitted" error.

See kubernetes-sigs/kind 2058

NOTE: The caller process of this function is now assumed to be as
privileged as possible.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-10 17:14:17 +09:00
Derek McGowan
b3f2402062
Merge pull request #5002 from crosbymichael/anno-image-name
[cri] add image-name annotation
2021-02-05 08:27:41 -08:00
Kazuyoshi Kato
07db46ee23 lint: update nolint syntax for golangci-lint
Newer golangci-lint needs explicit `//` separator. Otherwise it treats
the entire line (`staticcheck deprecated ... yet`) as a name.

https://golangci-lint.run/usage/false-positives/#nolint

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-02-04 11:59:55 -08:00
Michael Crosby
99cb62f233 [cri] add image-name annotation
For some tools having the actual image name in the annotations is helpful for
debugging and auditing the workload.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-02-04 07:05:11 -05:00
Alban Crequy
28e4fb25f4 cri: add annotations for pod name and namespace
cri-o has annotations for pod name, namespace and container name:
https://github.com/containers/podman/blob/master/pkg/annotations/annotations.go

But so far containerd had only the container name.

This patch will be useful for seccomp agents to have a different
behaviour depending on the pod (see runtime-spec PR 1074 and runc PR
2682). This should simplify the code in:
b2d423695d/pkg/kuberesolver/kuberesolver.go (L16-L27)

Signed-off-by: Alban Crequy <alban@kinvolk.io>
2021-01-26 12:10:39 +01:00
Michael Crosby
a731039238 [cri] label etc files for selinux containers
Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-01-19 13:42:09 -05:00
Mike Brown
6467c3374d refactor based on comments
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2020-12-07 21:39:31 -06:00
Mike Brown
b4727eafbe adding code to support seccomp apparmor securityprofile
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2020-12-04 15:15:32 -06:00
Michael Crosby
3d358c9df3 [cri] don't clear base security settings
When a base runtime spec is being used, admins can configure defaults for the
spec so that default ulimits or other security related settings get applied for
all containers launched.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2020-12-02 06:51:37 -05:00
Jacob Blain Christen
a1e7dd939d cri: selinuxrelabel=false for /dev/shm w/ host ipc
This is a followup to #4699 that addresses an oversight that could cause
the CRI to relabel the host /dev/shm, which should be a no-op in most
cases. Additionally, fixes unit tests to make correct assertions for
/dev/shm relabeling.

Discovered while applying the changes for #4699 to containerd/cri 1.4:
https://github.com/containerd/cri/pull/1605

Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2020-11-11 15:22:17 -07:00
Jacob Blain Christen
e8d8ae3b97 cri: selinux relabel /dev/shm
Address an issue originally seen in the k3s 1.3 and 1.4 forks of containerd/cri, https://github.com/rancher/k3s/issues/2240

Even with updated container-selinux policy, container-local /dev/shm
will get mounted with container_runtime_tmpfs_t because it is a tmpfs
created by the runtime and not the container (thus, container_runtime_t
transition rules apply). The relabel mitigates such, allowing envoy
proxy to work correctly (and other programs that wish to write to their
/dev/shm) under selinux.

Tested locally with:
- SELINUX=Enforcing vagrant up --provision-with=shell,selinux,test-integration
- SELINUX=Enforcing CRITEST_ARGS=--ginkgo.skip='HostIpc is true' vagrant up --provision-with=shell,selinux,test-cri
- SELINUX=Permissive CRITEST_ARGS=--ginkgo.focus='HostIpc is true' vagrant up --provision-with=shell,selinux,test-cri

Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2020-11-06 12:05:17 -07:00
Maksym Pavlenko
3d02441a79 Refactor pkg packages
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-10-08 17:30:17 -07:00
Maksym Pavlenko
3508ddd3dd Refactor CRI packages
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-10-07 14:45:57 -07:00
Derek McGowan
b22b627300
Move cri server packages under pkg/cri
Organizes the cri related server packages under pkg/cri

Signed-off-by: Derek McGowan <derek@mcg.dev>
2020-10-07 13:09:37 -07:00