containerd

Author	SHA1	Message	Date
Kathryn Baldauf	95ba6e9f75	Add annotations to task update request api Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>	2020-11-09 14:13:33 -08:00
Maksym Pavlenko	4da306e1e9	Fix panic in shim not logged Fix #4274 Carry #4298 Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2020-10-26 09:05:47 -07:00
Giuseppe Capizzi	8eda32e107	Check if a process exists before returning it Fixes #4632. Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io> Co-authored-by: Danail Branekov <danailster@gmail.com>	2020-10-22 16:50:14 +03:00
Akihiro Suda	915263f269	Merge pull request #4502 from akshat-kmr/master Add logging binary support when terminal is true	2020-10-08 12:14:39 +09:00
Maksym Pavlenko	c59d1cd5b0	Fix linter issues Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2020-10-07 15:42:01 -07:00
Phil Estes	68d97331be	Merge pull request #4538 from fuweid/update-shim-cleanup runtime/v2: cleanup dead shim before delete bundle	2020-09-21 13:32:40 -04:00
Wei Fu	4b05d03903	runtime/v2: cleanup dead shim before delete bundle The shim delete action needs bundle information to cleanup resources created by shim. If the cleanup dead shim is called after delete bundle, the part of resources maybe leaky. The ttrpc client UserOnCloseWait() can make sure that resources are cleanup before delete bundle, which synchronizes task deletion and cleanup deadshim. It might slow down the task deletion, but it can make sure that resources can be cleanup and avoid EBUSY umount case. For example, the sandbox container like Kata/Firecracker might have mount points over the rootfs. If containerd handles task deletion and cleanup deadshim parallelly, the task deletion will meet EBUSY during umount and fail to cleanup bundle, which makes case worse. And also update cleanupAfterDeadshim, which makes sure that cleanupAfterDeadshim must be called after shim disconnected. In some case, shim fails to call runc-create for some reason, but the runc-create already makes runc-init into ready state. If containerd doesn't call shim deletion, the runc-init process will be leaky and hold the cgroup, which makes pod terminating :(. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-09-20 11:24:31 +08:00
Akshat Kumar	61da6986c0	Cleanup open pipes if logging binary fails to start Signed-off-by: Akshat Kumar <kshtku@amazon.com>	2020-09-10 20:06:51 -07:00
Brian Goff	dab7bd0c45	Always consume shim logs These fifos fill up if unconsumed, so always consume them. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-09-10 10:23:29 -07:00
Brian Goff	5f9d15eaac	shimv1: downgrade poroccess missing log to debug This `Info` log shows up for all exec processes that use the v1 shim with Docker because Docker deletes the process once it receives the exit event from containerd. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-09-01 10:31:41 -07:00
Akshat Kumar	4cc99e57a7	Remove unnecessary logging binary helpers and add godoc Signed-off-by: Akshat Kumar <kshtku@amazon.com>	2020-08-26 09:15:02 -07:00
Akshat Kumar	7a9fbec5fb	Add logging binary support when terminal is true Currently the shims only support starting the logging binary process if the io.Creator Config does not specify Terminal: true. This means that the program using containerd will only be able to specify FIFO io when Terminal: true, rather than allowing the shim to fork the logging binary process. Hence, containerd consumers face an inconsistent behavior regarding logging binary management depending on the Terminal option. Allowing the shim to fork the logging binary process will introduce consistency between the running container and the logging process. Otherwise, the logging process may die if its parent process dies whereas the container will keep running, resulting in the loss of container logs. Signed-off-by: Akshat Kumar <kshtku@amazon.com>	2020-08-25 17:28:29 -07:00
Wei Fu	73b1449278	runtime: ignore ErrNotExist when remove rootfs Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-08-12 20:04:50 +08:00
Brian Goff	d7b9cb0019	shim: move event context timeout to publsher Before this change, if an event fails to send on the first attempt, subsequent attempts will fail with context.Cancelled because the the caller of publish passes a cancellable timeout, which the publisher uses to send the event. The publisher returns immediately if the send fails, but adds the event to an async queue to try again. Meanwhile the caller will return cancelling the context. Additionally, subsequent attempts may fail to send because the timeout was expected to be for a single request but the queue sleeps for `attempt*time.Second`. In the shim service, the timeout was set to 5s, which means the send will fail with context.DeadlineExceeded before it reaches `maxRequeue` (which is currently 5). This change moves the timeout to the publisher so each send attempt gets its own timeout. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-20 17:51:10 -07:00
Akihiro Suda	fd99b6566b	decrease log level of cgroup2 ToggleController error when running in UserNS Fix #4312 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-06-24 18:15:16 +09:00
Phil Estes	fb80a49ec1	Merge pull request #4327 from AkihiroSuda/fix-4326 shim v2 runc: propagate options.Root to Cleanup	2020-06-17 09:23:53 -04:00
Akihiro Suda	f1a469a035	shim v2 runc: propagate options.Root to Cleanup Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root. Fix #4326 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-06-17 19:06:36 +09:00
Wei Fu	d656fa38ca	restart plugin: support binary log uri Introduce LogURIGenerator helper function in cio package. It is used in the restart options, like WithBinaryLogURI and WithFileLogURI. And restart.LogPathLabel might be used in production and work well. In order to reduce breaking change, the LogPathLabel is still recognized if new LogURILabel is not set. In next release 1.5, the LogPathLabel will be removed. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-06-10 00:09:24 +08:00
Michael Crosby	7ce8a9d7d3	Merge pull request #4204 from ashrayjain/aj/add-kill-retry Make killing shims more resilient	2020-06-03 11:10:43 -04:00
Ashray Jain	3e95727f39	Make killing shims more resilient Currently, we send a single SIGKILL to the shim process once and then we spin in a loop where we use kill(pid, 0) to detect when the pid has disappeared completely. Unfortunately, this has a race condition since pids can be reused causing us to spin in an infinite loop when that happens. This adds a timeout to this loop which logs a warning and exits the infinite loop. Signed-off-by: Ashray Jain <ashrayj@palantir.com>	2020-06-03 12:57:08 +01:00
Akihiro Suda	2f601013e6	cgroup2: implement `containerd.events.TaskOOM` event How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524): (host)$ sudo swapoff -a (host)$ sudo ctr run -t --rm --memory-limit $((1024102432)) docker.io/library/alpine:latest foo (container)$ sh -c 'VAR=$(seq 1 100000000)' An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-06-01 14:00:13 +09:00
Sebastiaan van Stijn	dc92ad6520	Replace errors.Cause() with errors.Is() Dependencies may be switching to use the new `%w` formatting option to wrap errors; switching to use `errors.Is()` makes sure that we are still able to unwrap the error and detect the underlying cause. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-05-08 14:36:45 +02:00
Sebastiaan van Stijn	1b66fecad3	Integrate sys.SetSubreaper, sys.GetSubreaper in sys/reaper package Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-05-04 08:44:02 +02:00
Wei Fu	9687ba6315	test: TestRuntimeWithEmptyMaxEnvProcs should cleanup TestRuntimeWithEmptyMaxEnvProcs should restore the GoMaxProcs after test so that the temporary change of GoMaxProcs will not impact other case, like TestRuntimeWithNonEmptyMaxEnvProcs. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-04-23 22:09:10 +08:00
Wei Fu	0116352e1b	runtime: ignore ttrpc.ErrClosed when delete task For some reason, shimv2 process doesn't exist. The ttrpc doesn't detect the connection closed by server until delete task. For this case, we should ignore the ttrpc.ErrClosed and let task manager handle the cleanup. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-04-20 23:34:49 +08:00
Michael Crosby	2ed8d12bb0	Merge pull request #3845 from fahedouch/v2_shim_test v2 runtime shim test	2020-04-13 12:26:05 -04:00
Maksym Pavlenko	0caa233158	Rework shim logger shutdown process Signed-off-by: Maksym Pavlenko <makpav@amazon.com>	2020-04-07 12:42:04 -07:00
Michael Crosby	649f2aac66	add -v to shim binaries Request came from a slack message that shims do not output their versions making it hard for users and operators to know what version of a shim they have on the system. This adds a `-v` flag to the shims so that users can see if a shim is in sync with containerd or what versions of shims that they are running. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2020-03-17 13:23:06 -04:00
Maksym Pavlenko	2532bdf43f	Merge pull request #4100 from lifubang/publisher fix dial error when clean up a dead shim	2020-03-14 15:19:48 -07:00
lifubang	488d6194f2	fix dial error when clean up a dead shim Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-03-12 10:57:55 +08:00
Kir Kolyshkin	6e638ad27a	Nit: fix use of bufio.Scanner.Err The Err() method should be called after the Scan() loop, not inside it. Found by: git grep -A3 -F '.Scan()' Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-11 19:36:21 -07:00
Tobias Klauser	a9bd451ab4	Avoid duplicate imports of github.com/gogo/protobuf/types Re-use the import aliased as `ptypes`. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2020-03-10 09:41:03 +01:00
zyu	e3ab8bda60	Avoid allocating slice for finding Process Signed-off-by: zyu <yuzhihong@gmail.com>	2020-03-06 09:51:26 -08:00
Ted Yu	a687d3a36d	Check error return from json.Unmarshal Signed-off-by: Ted Yu <yuzhihong@gmail.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2020-03-05 13:38:08 -05:00
Maksym Pavlenko	4d242818bf	Merge pull request #4053 from AkihiroSuda/vendor-grpc-20200225 vendor protobuf & grpc (GoGoProtoPackageIsVersion3)	2020-02-27 11:59:59 -08:00
Phil Estes	669f516b0e	Merge pull request #4062 from tedyu/start-shim-defer Use named error return for service#StartShim	2020-02-27 13:23:31 -05:00
Ted Yu	f8ade8debd	Use named error return for service#StartShim Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-02-27 06:18:05 -08:00
Derek McGowan	06b284026d	Merge pull request #4063 from tedyu/namespace-path fix killall when use pidnamespace	2020-02-26 23:08:31 -08:00
Ted Yu	4105135e36	fix killall when use pidnamespace Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-02-26 20:56:49 -08:00
Phil Estes	ebec675a8d	Merge pull request #3802 from vladimiroff/unify-dialers Unify dialer implementations	2020-02-26 16:54:22 -05:00
Kiril Vladimiroff	4dd75be2b9	Unify dialer implementations Instead of having several dialer implementations, leave only one in `pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1\|2)/shim` which had their own Closes #3471. Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>	2020-02-26 23:29:04 +02:00
Akihiro Suda	8e448bb279	vendor protobuf & grpc Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-02-26 10:57:05 +09:00
Wei Fu	18e581dd91	bugfix: cleanup dangling shim by brand new context When there is timeout or cancel for create container, killShim will fail because of canceled context. The shim will be dangling and unmanageable. Need to use new context to do cleanup. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-02-21 16:49:58 +08:00
Li Yuxuan	84464b801f	v2: Cancel shim log ctx when ttrpc is closed The background context aovids shim blocking when the ctx is cancelled unexpectedly during shim start. But if the shim exits unexpectedly before opening the pipe, the fd will never be closed. `onCloseWithShimLog` makes sure that the shim log fd is closed properly once the shim disconnects. Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>	2020-02-20 23:20:10 +08:00
fahedouch	486d33631e	test runtime v2 CPU settings Signed-off-by: fahedouch <fahed.dorgaa@gmail.com>	2020-01-14 18:23:54 +01:00
Seth Pellegrino	66508589d3	fix: eventfd leak for v2 runtime with v1 cgroups There's no OOM monitoring for the v2 cgroups yet, so it seems unlikely that there was a leak in that case. Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>	2020-01-13 10:49:11 -08:00
Seth Pellegrino	9456040acb	fix: eventfd leak Only start watching the cgroup for OOMs when the first process starts instead of on every process. Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>	2020-01-13 10:39:54 -08:00
Li Yuxuan	1fb1d93212	v2: Fix missing ns when openShimLog on windows Related to https://github.com/containerd/containerd/pull/3921#discussion_r363046745 Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>	2020-01-05 19:42:33 +08:00
Li Yuxuan	d82fa43193	v2: Call shim.Delete at first when create is failed If the context is cancelled during `shim.Create()`, such as the client disconnects unexpectedly. The created shim will never be deleted. What's more, if the context is cancelled during `openShimLog()`, the fifo will be closed and block the shim output. Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>	2019-12-28 00:02:11 +08:00
Erik Sipsma	fbd46d7094	runtime v2: Close platform in runc shim's Shutdown method. Previously, the platform was closed as part of the Delete method when the process was an init for a task and there were no more tasks after its deletion. This can create problems if another task is created within the shim right after the delete runs, which results in the platform being closed but the shim continuing to run. This change moves closing the platform to the Shutdown method after the shim's context is canceled, which ensures the platform is only closed once the shim is sure its done servicing containers. Signed-off-by: Erik Sipsma <sipsma@amazon.com>	2019-12-19 09:47:40 -05:00

1 2 3 4 5 ...

466 Commits