Commit Graph

13124 Commits

Author SHA1 Message Date
Derek McGowan
e0e6f870b7
Merge pull request #9086 from dmcgowan/move-to-log-repo
Use github.com/containerd/log
2023-09-22 09:25:29 -07:00
Derek McGowan
8b413daff0
Remove log package except for exported const used by hcsshim
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Derek McGowan
2f1b92710a
Update zfs library to use new log repository
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Derek McGowan
508aa3a1ef
Move to use github.com/containerd/log
Add github.com/containerd/log to go.mod

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Rodrigo Campos
f1070c4e18 docs/userns: Clarify requirements for k8s 1.25/1.26
crun 1.4.3 as well as runc 1.1 both support to open bind-mounts before
dropping privileges, as they are inaccessible after switching to the
user namespace. So that is the minimum version to use with containerd
1.7.

Also, since containerd 2.0 we use idmap mounts for files mounted in the
container created by containerd (like etc/hostname, etc/hosts, etc.), so
in that case we require newer OCI runtimes too. However, as the kubelet
doesn't request idmap mounts for kube volumes, we can lower the kernel
version.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-22 15:52:38 +02:00
Fu Wei
7a0e6b7e77
Merge pull request #9112 from adityaramani/handle-shim-kill
Sandbox: Handle unexpected shim kill events
2023-09-22 13:31:11 +08:00
Akihiro Suda
3ebe5d1c56
Merge pull request #9124 from dmcgowan/cri-image-store-no-client
Update CRI image store to not use containerd client
2023-09-21 19:17:21 +09:00
Davanum Srinivas
b101cad15c
Merge pull request #9126 from bryantbiggs/fix/add-containerd-namespace
fix: Add `containerd` to the message type reference
2023-09-20 22:51:43 -04:00
Samuel Karp
87671c2dee
Merge pull request #9122 from henry118/netns-doc 2023-09-20 16:25:15 -07:00
Bryant Biggs
42eee8bf05 fix: Add containerd to the message type reference
Signed-off-by: Bryant Biggs <bryantbiggs@gmail.com>
2023-09-20 16:32:05 -04:00
Derek McGowan
c3694aaf87
Merge pull request #9093 from thaJeztah/swap_log_pkg_alias
alias log package to github.com/containerd/log v0.1.0, and (soft)deprecate
2023-09-20 11:45:59 -07:00
Derek McGowan
9e819fb4a8
Update CRI image store to not use containerd client
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-20 10:11:51 -07:00
Henry Wang
dcb2e7447b Improve doc of func NewNetNS
Signed-off-by: Henry Wang <henwang@amazon.com>
2023-09-20 17:00:33 +00:00
Fu Wei
782ad19f6c
Merge pull request #8356 from dmcgowan/drop-inheritable-capabilities
Support for dropping inheritable capabilities
2023-09-20 09:40:45 +08:00
Derek McGowan
2ce971d890
Add delete target to image remove
Adds atomicity to image delete when deleting from a list.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-19 17:23:33 -07:00
Derek McGowan
f8fb2dad39
api: update image service to support target in delete request
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-19 17:17:16 -07:00
Rodrigo Campos
8e3722c7d1 CI: Set slow_chown for overlayfs snapshotter
Userns requires idmap mounts or to opt-in for a slow and expensive
chown. As idmap mounts support for overlayfs was merged in 5.19, let's
add the slow_chown config for our CI.

The config is harmless to keep it in new kernels, as if idmap mounts is
supported, it will be just used. Whenever all our CI is run with kernels
>= 5.19, we can remove this setting.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 17:55:47 +02:00
Rodrigo Campos
46d3094aa3 docs/userns: Fix small typo
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 16:37:40 +02:00
Rodrigo Campos
d008d64a8f docs/userns: Clarify containerd 1.7 limitations
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 16:37:40 +02:00
Rodrigo Campos
e379082000 docs/userns: Document the need to opt-in for a slow chown
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 16:37:40 +02:00
Rodrigo Campos
8bf8e2b975 snapshotter: Use capa prefix consistently for capabilities
The overlay snapshotter is using capa, not capab, let's use that in all
the palces.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 14:42:51 +02:00
Rodrigo Campos
ec9e0dca91 overlay: Require opt-in if idmap mounts are not supported.
If we don't use idmap mounts, doing a chown per pod is very expensive:
it implies duplicating the container storage for the image for every pod
and the latency to start a new pod is affected too.

Let's make sure users are aware of this, by having them opt-in, for
snapshotters that we have a better solution (like overlayfs, that has
support for idmap mounts).

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-19 14:42:51 +02:00
Sebastiaan van Stijn
03b9ce56b5
deprecate logs package, but disable linter (for transitioning)
Deprecate the pacakge, but suppress linting errors for now. This is to allow
backporting these changes to release branches, which may still need to transition.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-19 08:44:35 +02:00
Sebastiaan van Stijn
d69ae811d6
alias log package to github.com/containerd/log v0.1.0
This "soft" deprecates the package, but keeps the local uses of the package,
which can make backporting this to release-branches easier (we can
still move all uses in those branches as well though).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-19 08:44:10 +02:00
Akihiro Suda
00666764b8
Merge pull request #9102 from dmcgowan/add-usage-package
Add usage package
2023-09-19 11:24:26 +09:00
Bjorn Neergaard
6c6dfcbce2
contrib/apparmor: deny /sys/devices/virtual/powercap
While this is not strictly necessary as the default OCI config masks this
path, it is possible that the user disabled path masking, passed their
own list, or is using a forked (or future) daemon version that has a
modified default config/allows changing the default config.

Add some defense-in-depth by also masking out this problematic hardware
device with the AppArmor LSM.

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-09-18 16:57:09 -06:00
Bjorn Neergaard
106a9b7767
oci/spec: deny /sys/devices/virtual/powercap
The ability to read these files may offer a power-based sidechannel
attack against any workloads running on the same kernel.

This was originally [CVE-2020-8694][1], which was fixed in
[949dd0104c496fa7c14991a23c03c62e44637e71][2] by restricting read access
to root. However, since many containers run as root, this is not
sufficient for our use case.

While untrusted code should ideally never be run, we can add some
defense in depth here by masking out the device class by default.

[Other mechanisms][3] to access this hardware exist, but they should not
be accessible to a container due to other safeguards in the
kernel/container stack (e.g. capabilities, perf paranoia).

[1]: https://nvd.nist.gov/vuln/detail/CVE-2020-8694
[2]: 949dd0104c
[3]: https://web.eece.maine.edu/~vweaver/projects/rapl/

Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
2023-09-18 16:56:11 -06:00
Aditya Ramani
729c97cf39 Handle unexpected shim kill events
When a shim process is unexpectedly killed in a way that was not initiated through containerd - containerd reports the pod as not ready but the containers as running. This results in kubelet repeatedly sending container kill requests that fail since containerd cannot connect to the shim.

Changes:

- In the container exit handler, treat `err: Unavailable` as if the container has already exited out
- When attempting to get a connection to the shim, if the controller isn't available assume that the shim has been killed (needs to be done since we have a separate exit handler that cleans up the reference to the shim controller - before kubelet has the chance to call StopPodSandbox)

Signed-off-by: Aditya Ramani <a_ramani@apple.com>
2023-09-18 12:15:55 -07:00
Derek McGowan
ed5f7e7c8c
Update image in client to use new usage package
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-18 11:20:07 -07:00
Derek McGowan
96a23ccc1d
Create new usage package
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-18 11:19:22 -07:00
Phil Estes
82df7d5208
Merge pull request #9091 from thaJeztah/update_nri
vendor: github.com/containerd/nri v0.5.0
2023-09-18 10:17:06 -04:00
Akihiro Suda
a8d078cc9b
Merge pull request #9108 from BinSquare/remove-SourceDateEpochOrNow
Refactor: Removing inherently flaky and unused SourceDateEpochOrNow function.
2023-09-18 17:58:40 +09:00
BinBin He
79f781d009 Refactor: Removing inherently flaky and unused SourceDateEpochOrNow function.
Signed-off-by: BinBin He <BinSquare@users.noreply.github.com>
2023-09-17 08:34:26 -07:00
Sebastiaan van Stijn
8cbb4ea5d3
vendor: github.com/containerd/nri v0.5.0
This version no longer has a dependency on containerd, cutting
down the number of circular dependencies.

full diff: https://github.com/containerd/nri/compare/v0.4.0...v0.5.0

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-16 10:40:21 +02:00
Brennan Kinney
3ca39ef016 fix: Remove LimitNOFILE from containerd.service
Remove `LimitNOFILE` from `containerd.service` to rely on the systemd v240 implicit default of `1024:524288`. On supported platforms with systemd prior to v240, packagers will patch the service with an explicit `LimitNOFILE=1024:524288`.

- `1024` soft limit is an implicit default, avoiding unexpected breakage. Software that needs a higher limit should request to raise the soft limit for its process.
- `524288` hard limit is an implicit default since systemd v240 and is adequate for most processes (_half of the historical limit from `fs.nr_open` of `1048576`_), while 4096 is the implicit default from the kernel (often too low).
- The hard limit may not exceed `fs.nr_open` (_which a value of `infinity` will resolve to_). On most systems with systemd v240 or newer, this will resolve to an excessive size of 2^30 (over 1 billion).
- When set to `infinity` (usually as the soft limit) software may experience significantly increased resource usage, resulting in a performance regression or runtime failures that are difficult to troubleshoot.

Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
2023-09-15 09:04:53 +12:00
Derek McGowan
31b6cdfd10
Merge pull request #8493 from DataDog/image-verifier-bindir-plugin
Add image verifier transfer service plugin system based on a binary directory
2023-09-14 06:37:17 -07:00
Phil Estes
3f315fcabf
Merge pull request #9095 from thaJeztah/isolate_platform 2023-09-14 08:31:50 -04:00
Fu Wei
fe17f65159
Merge pull request #8287 from kinvolk/rata/userns-stateless-idmap
Add support for userns in stateless and stateful pods with idmap mounts (KEP-127, k8s >= 1.27)
2023-09-14 18:14:02 +08:00
Rodrigo Campos
83240a4f77 Bump crun to 1.9
crun 1.9 was just released with fixes and exposes idmap mounts support
via the "features" sub-command.

We use that feature to throw a clear error to users (if they request
idmap mounts and the OCI runtime doesn't support it), but also to skip
tests on CI when the OCI runtime doesn't support it.

Let's bump it so the CI runs the tests with crun.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-14 11:03:15 +02:00
Rodrigo Campos
967313049f doc: Add documentation about CRI user namespaces
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 23:37:47 +02:00
Phil Estes
a831db9d35
Merge pull request #9096 from AkihiroSuda/release-remove-extra-binaries
release: remove `cri-containerd-*.tar.gz` release bundles
2023-09-13 14:23:34 -04:00
Phil Estes
25198a0557
Merge pull request #8330 from vvoland/remotes-docker-status-exists-mounted
remotes/docker: Add Mounted and Exists push status
2023-09-13 13:14:39 -04:00
Rodrigo Campos
2e13d39546 pkg/process: Only use idmap mounts if runc supports it
runc, as mandated by the runtime-spec, ignores unknown fields in the
config.json. This is unfortunate for cases where we _must_ enable that
feature or fail.

For example, if we want to start a container with user namespaces and
volumes, using the uidMappings/gidMappings field is needed so the
UID/GIDs in the volume don't end up with garbage. However, if we don't
fail when runc will ignore these fields (because they are unknown to
runc), we will just start a container without using the mappings and the
UID/GIDs the container will persist to volumes the hostUID/GID, that can
change if the container is re-scheduled by Kubernetes.

This will end up in volumes having "garbage" and unmapped UIDs that the
container can no longer change. So, let's avoid this entirely by just
checking that runc supports idmap mounts if the container we are about
to create needs them.

Please note that the "runc features" subcommand is only run when we are
using idmap mounts. If idmap mounts are not used, the subcommand is not
run and therefore this should not affect containers that don't use idmap
mounts in any way.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Rodrigo Campos
fce1b95076 go.mod: Update runtime spec to include features.MountExtensions
Future patches will use that field.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Rodrigo Campos
a81f80884b Revert "cri: Throw an error if idmap mounts is requested"
This reverts commit 7e6ab84884.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Rodrigo Campos
e832605a80 integration: Simplify WithVolumeMount()
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Rodrigo Campos
24aa808fe2 integration: Add userns test with volumes
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Rodrigo Campos
ab5b43fe80 cri/sbserver: Pass down UID/GID mappings to OCI runtime
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-13 16:44:54 +02:00
Sebastiaan van Stijn
e916d77c81
platforms: move ToProto, FromProto to api/types
These utilities resulted in the platforms package to have the containerd
API as dependency. As this package is used in many parts of the code, as
well as external consumers, we should try to keep it light on dependencies,
with the potential to make it a standalone module.

These utilities were added in f3b7436b61,
which has not yet been included in a release, so skipping deprecation
and aliases for these.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-13 16:44:52 +02:00
Sebastiaan van Stijn
381442945b
platforms: remove errdefs dependency
This cleans up the platforms package from dependencies that are not strictly
needed. This is in preparation of making this package a separate module, which
can be shared by plugins, and containerd versions (as well as external consumers),

- Remove dependency on the errdefs package: most uses of these error
  definitions were used internally, and other errors may not be useful
  for external consumers as sentinel errors.
- ErrInvalidArgument may be a potential exception, although a look at
  current uses of this package shows that there's no special handling of
  invalid parameters vs other errors (all would boil down to "the passed
  platform is invalid" (either the format, or parsing is not implemented
  on a specific platform)
- Remove uses of the convenience "Platform" alias in favor of using the
  upstream (from OCI spec). Consumers of this package can still use the
  convenience alias, but make sure that function signatures do not imply
  that it's a different type (which can cause confusion).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-09-13 16:44:48 +02:00