containerd

Author	SHA1	Message	Date
Eric Ernst	8b9571e348	containerd-stress: start task ctr before starting execs For some runtimes, the container is not ready for exec until the initial container task has been started (as opposed to just having the task created). More specifically, running containerd-stress with --exec would break with Kata Container shim, since the sandbox is not created until a start is issued. By starting the container's primary task before adding exec's, we can avoid: ``` error="cannot enter container exec-container-1, with err Sandbox not running, impossible to enter the container: unknown" ``` Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-04 16:08:44 -08:00
Gabriel Adrian Samfira	b63000c65d	[Windows][Integration] Enable TestRestartMonitor With the release of hcsshim v0.9.2, this test should pass without issues on Windows. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2022-02-04 17:27:14 +02:00
Markus Lehtonen	9b1fb82584	cri: fix handling of ignore_rdt_not_enabled_errors config option We were not properly ignoring errors from gorestrl.rdt.ContainerClassFromAnnotations() causing the config option to be ineffective, in practice. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2022-02-04 13:54:03 +02:00
Akihiro Suda	4f5ce5615a	Merge pull request #6501 from henry118/issue6499 Document fs_type and fs_options in snapshots/devmapper/README.md	2022-02-04 18:04:29 +09:00
Maksym Pavlenko	a5d093991a	Merge pull request #6510 from smira/adoption-talos	2022-02-03 12:36:49 -08:00
Andrey Smirnov	dcbe3e4713	docs: add Talos Linux to the list of adopters Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-02-03 21:10:28 +03:00
Derek McGowan	943ca856ad	Merge pull request #6502 from dmcgowan/prepare-1.6.0-rc.2 Prepare 1.6.0-rc.2	2022-02-03 08:54:18 -08:00
Jeremi Piotrowski	7275411ec8	cgroup2: monitor OOMKill instead of OOM to prevent missing container OOM events With the cgroupv2 configuration employed by Kubernetes, the pod cgroup (slice) and container cgroup (scope) will both have the same memory limit applied. In that situation, the kernel will consider an OOM event to be triggered by the parent cgroup (slice), and increment 'oom' there. The child cgroup (scope) only sees an oom_kill increment. Since we monitor child cgroups for oom events, check the OOMKill field so that we don't miss events. This is not visible when running containers through docker or ctr, because they set the limits differently (only container level). An alternative would be to not configure limits at the pod level - that way the container limit will be hit and the OOM will be correctly generated. An interesting consequence is that when spawning a pod with multiple containers, the oom events also work correctly, because: a) if one of the containers has no limit, the pod has no limit so OOM events in another container report correctly. b) if all of the containers have limits then the pod limit will be a sum of container events, so a container will be able to hit its limit first. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2022-02-03 13:39:16 +01:00
Jeremi Piotrowski	821c961c86	pkg/oom/v2: handle EventChan routine shutdown quietly When the cgroup is removed, EventChan is closed (this was pulled in by `8d69c041c5`). This results in a nil error being received. Don't log an error in that case but instead return. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2022-02-03 13:20:46 +01:00
Henry Wang	2d9d5fddbd	Document fs_type and fs_options in snapshots/devmapper/README.md Resolves: #6499 Signed-off-by: Henry Wang <henwang@amazon.com>	2022-02-02 21:57:44 +00:00
Derek McGowan	604c462d7b	Merge pull request #6497 from thaJeztah/platform_keep_osversion_osfeatures platforms.Normalize(): do not reset OSVersion and OSFeatures	2022-02-02 12:06:09 -08:00
Michael Crosby	9a08d6fcde	Merge pull request #6457 from kzys/otel-http tracing: use OTLP/HTTP in addition to OTLP/gRPC	2022-02-02 14:24:15 -05:00
Derek McGowan	a31e28e2c2	Prepare release notes for v1.6.0-rc.2 Signed-off-by: Derek McGowan <derek@mcg.dev>	2022-02-02 11:01:31 -08:00
Derek McGowan	8944c12f56	Update releases document Move 1.4 EOL after 1.6 release. Update latest 1.4 and 1.5 versions. Signed-off-by: Derek McGowan <derek@mcg.dev>	2022-02-02 11:00:45 -08:00
Phil Estes	75d594834d	Merge pull request #6498 from dmcgowan/update-cgroups-1_0_3 Update cgroups to v1.0.3	2022-02-02 08:55:40 -05:00
Derek McGowan	d6a576ae6e	Merge pull request #6494 from AkihiroSuda/seccomp-5.16 seccomp: kernel 5.11 -> 5.16	2022-02-01 18:13:36 -08:00
Derek McGowan	05177ab5cd	Merge pull request #6243 from ktock/pusher-abort remotes: fix dockerPusher to handle abort correctly	2022-02-01 18:07:46 -08:00
Derek McGowan	8d69c041c5	Update cgroups to v1.0.3 Pull in latest cgroups to pick up leak fixes Signed-off-by: Derek McGowan <derek@mcg.dev>	2022-02-01 16:57:51 -08:00
Andrew G. Morgan	6906b57c72	Fix the Inheritable capability defaults. The Linux kernel never sets the Inheritable capability flag to anything other than empty. Non-empty values are always exclusively set by userspace code. [The kernel stopped defaulting this set of capability values to the full set in 2000 after a privilege escalation with Capabilities affecting Sendmail and others.] Signed-off-by: Andrew G. Morgan <morgan@kernel.org>	2022-02-01 13:55:46 -08:00
Sebastiaan van Stijn	bec6e4dd67	platforms.Normalize(): do not reset OSVersion and OSFeatures Commit `fb0688362c` implemented the Normalize() function, but marked these fields as deprecated. It's unclear what the motivation was for this, as the fields are part of the OCI Image spec. On Windows, the OSVersion field specifically is important when matching images (as kernel versions may not be compatible). This patch updates platforms.Normalize() to preserve the OSVersion and OSFeatures fields. As a follow-up, we should look at defining an appropriate string-representation for these fields (possibly as part of the OCI Spec), and update platforms.Parse() accordingly. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-02-01 17:19:28 +01:00
Akihiro Suda	34f7173491	seccomp: kernel 5.16 (futex_waitv) Allow `futex_waitv` by default. See https://www.phoronix.com/scan.php?page=news_item&px=FUTEX2-futex-waiv-More-Archs Note: libseccomp does not cover kernel 5.16 at this moment: `51b50f95e1/src/syscalls.csv` Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-02-01 09:08:06 +09:00
Akihiro Suda	8632bdcb7b	seccomp: kernel 5.15 (process_mrelease) Allow `process_mrelease` by default. See https://lwn.net/Articles/864184/ Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-02-01 09:08:05 +09:00
Akihiro Suda	c013db6965	seccomp: kernel 5.14 (quotactl_fd, memfd_secret) - Allow `quotactl_fd` when `CAP_SYS_ADMIN` is granted. See https://lwn.net/Articles/859679/ - Allow `memfd_secret` by default. See https://lwn.net/Articles/865256/ Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-02-01 09:08:01 +09:00
Akihiro Suda	17a2831f70	seccomp: kernel 5.13 (landlock_{add_rule,create_ruleset,restrict_self}) Allow the following syscalls by default: - `landlock_add_rule` - `landlock_create_ruleset` - `landlock_restrict_self` See https://landlock.io/ Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-02-01 09:07:33 +09:00
Akihiro Suda	1329ea3716	seccomp: kernel 5.12 (mount_setattr) Allow `mount_setattr` when `CAP_SYS_ADMIN` is granted. See https://man7.org/linux/man-pages/man2/mount_setattr.2.html Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2022-02-01 09:06:41 +09:00
Michael Crosby	52b8ca5545	Merge pull request #6411 from nmeum/swapcontext seccomp: add support for "swapcontext" syscall in default policy	2022-01-31 16:11:55 -05:00
Sebastiaan van Stijn	fdbfde5d81	cmd/containerd-shim: add -v (version) flag Unlike the other shims, containerd-shim did not have a -v (version) flag: ./bin/containerd-shim-runc-v1 -v ./bin/containerd-shim-runc-v1: Version: v1.6.0-rc.1 Revision: ad771115b82a70cfd8018d72ae489c707e63de16.m Go version: go1.17.2 ./bin/containerd-shim -v flag provided but not defined: -v Usage of ./bin/containerd-shim: This patch adds a `-v` flag to be consistent with the other shims. The code was slightly refactored to match the implementation in the other shims, taking the same approach as `77d53d2d23/runtime/v2/shim/shim.go (L240-L256)` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-01-31 21:09:50 +01:00
Sebastiaan van Stijn	e79aba10d4	integration/images/volume-ownership: strip path information from usage output POSIX guidelines describes; https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html#g_t_002d_002dversion > The program’s name should be a constant string; don’t compute it from argv[0]. > The idea is to state the standard or canonical name for the program, not its > file name. We don't have a const for this, but let's make a start and just remove the path info. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-01-31 21:07:00 +01:00
Sebastiaan van Stijn	b8cadf7539	runtime/v2/shim: strip path information from version output I noticed that path information showed up in the version output: ./bin/containerd-shim-runc-v1 -v ./bin/containerd-shim-runc-v1: Version: v1.6.0-rc.1 Revision: ad771115b82a70cfd8018d72ae489c707e63de16.m Go version: go1.17.2 POSIX guidelines describes; https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html#g_t_002d_002dversion > The program’s name should be a constant string; don’t compute it from argv[0]. > The idea is to state the standard or canonical name for the program, not its > file name. Unfortunately, this code is used by multiple binaries, so we can't fully remove the use of os.Args[0], but let's make a start and just remove the path info. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2022-01-31 21:01:01 +01:00
Derek McGowan	c2cb589221	Merge pull request #6478 from fuweid/enhance-no-sync-during-create oci: use readonly mount to read user/group info	2022-01-31 10:35:51 -08:00
Michael Crosby	e178d831ef	Merge pull request #6475 from estesp/import-correct-media-type Fix possibly incorrect media type default on import	2022-01-31 11:47:24 -05:00
Michael Crosby	82af36e59b	Merge pull request #5828 from cpuguy83/shimv2_exit_on_signals shimv2: handle sigint/sigterm	2022-01-31 10:47:39 -05:00
Kazuyoshi Kato	cc59ae4d98	tracing: return (ctx, span) from StartSpan OpenTelemetry's Tracer#Start() returns (ctx, span). We have no reasons to swap them. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>	2022-01-29 00:41:21 +00:00
Kazuyoshi Kato	e751f1f44f	tracing: support OTLP/HTTP in addition to gRPC This change adds OTLP/HTTP, specifically http/protobuf support. http/protobuf is recommended in https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/protocol/exporter.md. However kube-apiserver and CRI-O use gRPC, kubelet may support gRPC in future. So we should support gRPC as well. Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>	2022-01-29 00:41:18 +00:00
Michael Crosby	9c676e98dd	Merge pull request #6481 from Junnplus/acr-400 Fix acr fetch token 400	2022-01-28 11:53:51 -05:00
Wei Fu	813a061fe1	oci: use readonly mount to read user/group info In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number of umount actions in the same time maybe introduce performance issue in IOPS limited disk. When CRI-plugin creates container, it will temp-mount rootfs to read that UID/GID info for entrypoint. Basically, the rootfs is writable snapshotter and then after read, umount will invoke sync-fs action. For example, using overlayfs on ext4 and use bcc-tools to monitor ext4_sync_fs call. ``` // uname -a Linux chaofan 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux // open terminal 1 kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod // open terminal 2 /usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P ext4_sync_fs sync_filesystem ovl_sync_fs __sync_filesystem sync_filesystem generic_shutdown_super kill_anon_super deactivate_locked_super deactivate_super cleanup_mnt __cleanup_mnt task_work_run exit_to_user_mode_prepare syscall_exit_to_user_mode do_syscall_64 entry_SYSCALL_64_after_hwframe syscall.Syscall.abi0 github.com/containerd/containerd/mount.unmount github.com/containerd/containerd/mount.UnmountAll github.com/containerd/containerd/mount.WithTempMount.func2 github.com/containerd/containerd/mount.WithTempMount github.com/containerd/containerd/oci.WithUserID.func1 github.com/containerd/containerd/oci.WithUser.func1 github.com/containerd/containerd/oci.ApplyOpts github.com/containerd/containerd.WithSpec.func1 github.com/containerd/containerd.(Client).NewContainer github.com/containerd/containerd/pkg/cri/server.(criService).CreateContainer github.com/containerd/containerd/pkg/cri/server.(instrumentedService).CreateContainer k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1 github.com/containerd/containerd/services/server.unaryNamespaceInterceptor github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 github.com/grpc-ecosystem/go-grpc-prometheus.(ServerMetrics).UnaryServerInterceptor.func1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1 k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler google.golang.org/grpc.(Server).processUnaryRPC google.golang.org/grpc.(Server).handleStream google.golang.org/grpc.(Server).serveStreams.func1.2 runtime.goexit.abi0 containerd [34771] 1 ``` If there are comming several create requestes, umount actions might bring high IO pressure on the /var/lib/containerd's underlying disk. After checkout the kernel code[1], the kernel will not call __sync_filesystem if the mount is readonly. Based on this, containerd should use readonly mount to get UID/GID information. Reference: [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61 Closes: #4604 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2022-01-28 23:36:04 +08:00
Phil Estes	a43703fcba	Merge pull request #6455 from tonistiigi/amd64-variants platforms: add support for matching amd64 variants	2022-01-27 10:07:49 -05:00
ye.sijun	c0e00f19ab	fix acr fetch token 400 Signed-off-by: ye.sijun <junnplus@gmail.com>	2022-01-27 17:34:45 +08:00
Derek McGowan	3f5d789dfb	Merge pull request #6476 from gabriel-samfira/various-periodic-fixes Fix windows periodic workflow	2022-01-25 16:43:11 -08:00
Gabriel Adrian Samfira	4cd9f37f56	Fix windows periodic workflow This change addresses the following issues: * Fix fetching the public IP of the windows instance. * Fix generation of repolist.toml. * Resource cleanup is now run even if tests fail. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2022-01-25 21:54:16 +02:00
Phil Estes	4aff7431fe	Fix possibly incorrect media type default on import As reported, running import twice without using the compress import option means that the content store will have existing layers during the second import and the existing code hardcodes existing layer media type to compressed. This fixes the issue by actually reading the header bytes from the store and setting the media type appropriately. Signed-off-by: Phil Estes <estesp@amazon.com>	2022-01-25 14:11:20 -05:00
Brian Goff	3ffb6a6113	shimv2: handle sigint/sigterm This causes sigint/sigterm to trigger a shutdown of the shim. It is needed because otherwise the v2 shim hangs system shutdown. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2022-01-25 17:57:28 +00:00
Fu Wei	2986d5b077	Merge pull request #6473 from kzys/gc-docs	2022-01-25 13:25:49 +08:00
Kazuyoshi Kato	f048a25938	docs: add doc-comments on GC-related methods Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>	2022-01-24 14:26:14 -08:00
Maksym Pavlenko	731518417e	Merge pull request #6465 from fuweid/fix-issue-6429 fix: should not send 137 code event if cmd is notfound	2022-01-21 14:50:46 -08:00
Wei Fu	31a710c492	fix: should not send 137 code event if cmd is notfound ShimV2 has shim.Delete command to cleanup task's temporary resource, like bundle folder. Since the shim server exits and no persistent store is for task's exit code, the result of shim.Delete is always 137 exit code, like the task has been killed. And the result of shim.Delete can be used as task event only when the shim server is killed somehow after container is running. Therefore, dockerd, which watches task exit event to update status of container, can report correct status. Back to the issue #6429, the container is not running because the entrypoint is not found. Based on this design, we should not send 137 exitcode event to subscriber. This commit is aimed to remove shim instance first and then the `cleanupAfterDeadShim` should not send event. Similar Issue: #4769 Fix #6429 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2022-01-22 00:58:33 +08:00
Phil Estes	ab8d99cf4b	Merge pull request #6463 from Junnplus/empty-scope Fix empty scopes return	2022-01-20 15:34:11 -05:00
Jeff Zvier	356ca75757	containerd-shim-runc-v2: return init pid when clean dead shim If containerd-shim-runc-v2 process dead abnormally, such as received kill 9 signal, panic or other unkown reasons, the containerd-shim-runc-v2 server can not reap runc container and forward init process exit event. This will lead the container leaked in dockerd. When shim dead, containerd will clean dead shim, here read init process pid and forward exit event with pid at the same time. Signed-off-by: Jeff Zvier <zvier20@gmail.com>	2022-01-20 17:06:55 +08:00
ye.sijun	936faf9c98	fix empty scopes return Signed-off-by: ye.sijun <junnplus@gmail.com>	2022-01-20 15:16:44 +08:00
Derek McGowan	ad771115b8	Merge pull request #6462 from dmcgowan/prepare-1.6.0-rc.1 Prepare release notes for v1.6.0-rc.1	2022-01-19 19:13:47 -08:00

... 21 22 23 24 25 ...

11307 Commits