Commit Graph

74 Commits

Author SHA1 Message Date
Shiming Zhang
7966a6652a Cleanup code
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-04-19 16:59:45 +08:00
Maksym Pavlenko
993b863993 Add shim start opts
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-04-15 11:55:24 -07:00
Sebastiaan van Stijn
708299ca40
Move RunningInUserNS() to its own package
This allows using the utility without bringing whole of "sys" with it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-23 11:29:53 +01:00
Michael Crosby
e0c94bb269
Merge pull request #4708 from kzys/enable-criu
Re-enable CRIU tests by not using overlayfs snapshotter
2021-03-19 14:23:05 -04:00
Phil Estes
a0cc9b432d
Merge pull request #5195 from fuweid/fix-5173
runtime/v2/runc: fix leaking socket path
2021-03-17 09:33:41 -04:00
Kazuyoshi Kato
b520428b5a Fix CRIU
- process.Init#io could be nil
- Make sure CreateTaskRequest#Options is not empty before unmarshaling

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-03-16 16:46:45 -07:00
Iceber Gu
5e484c9613
runtime/v2/runc: fix the defer cleanup of the NewContainer
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-03-16 11:41:17 +08:00
Wei Fu
d895118c7c runtime/v2/runc: fix leaking socket path
When runC shimv2 starts, the StartShim interface will re-exec itself as
long-running process, which will read the `address` during initializing.

```happycase
Process

containerd-shim-runc-v1/v2 start             containerd-shim-runc-v1/v2

	initializing socket

	reexec containerd-shim-runc-v1/v2

	write address into file

						initializing

							read address

	write back to containerd daemon

						serving

						...

						remove address in Shutdown call
```

However, there is no synchronization after reexec. Then the data race is
like:

```leaking-case
Process

containerd-shim-runc-v1/v2 start             containerd-shim-runc-v1/v2

	initializing socket

	reexec containerd-shim-runc-v1/v2

						initializing

							read address

	write address into file

	write back to containerd daemon

						serving

						...

						fail to remove address
						because of empty address
```

The `address` should be writen into file first before reexec.

And if shutdown the whole service before cleanup temporary
resource (like socket file), the Shutdown caller will receive `ttrpc: closed`
sometime, which depends on go runtime scheduler. Then it also causes leaking
socket files.

Since the shimV2-Delete binary API must be called to cleanup shim temporary
resource and shimV2-runC-v1 doesn't support grouping multi containers in one,
it is safe to remove the socket file in the binary call for shimV2-runC-v1.
But for the shimV2-runC-v2 shim, we still cleanup socket in Shutdown.
Hopefully we can find a way to cleanup socket in shimV2-Delete binary
call.

Fix: #5173

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-03-15 18:32:00 +08:00
Shiming Zhang
05ef2fe2fb Fix missing close
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-02-18 13:21:42 +08:00
IceberGu
b458583b76
runtime: fix shutdown runc v2 service
Signed-off-by: IceberGu <wei.cai-nat@daocloud.io>
2021-02-02 15:36:49 +08:00
Aditi Sharma
1423e9199d Update gogo/protobuf to v1.3.2
bump version 1.3.2 for gogo/protobuf due to CVE-2021-3121 discovered
in gogo/protobuf version 1.3.1, CVE has been fixed in 1.3.2

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2021-01-28 12:57:50 +00:00
Georgi Sabev
7451dd1ed1 Return GRPC not found error instead of plain one
When the shim returns a plain error when a process does not exist,
the server is unable to recognise its GRPC status code and assumes
UnknownError. This is awkward for containerd client users as they are
unable to recognise the actual reason for the error.

When the shim returns a NotFound GRPC error, it is properly translated
by the server and clients receive a proper NotFound error instead of
Unknown

Please note that we (CF Garden) would like to have the eventual fix backported to 1.4 as well.

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2020-12-18 15:33:48 +02:00
Derek McGowan
4a4bb851f5
Merge pull request from GHSA-36xw-fx78-c5r4
Use path based unix socket for shims
2020-11-30 10:32:18 -08:00
Michael Crosby
bd908acabd
Use path based unix socket for shims
This allows filesystem based ACLs for configuring access to the socket of a
shim.

Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Michael Crosby <michael.crosby@apple.com>
2020-11-11 11:47:46 -08:00
Akshat Kumar
61da6986c0 Cleanup open pipes if logging binary fails to start
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-09-10 20:06:51 -07:00
Akshat Kumar
4cc99e57a7 Remove unnecessary logging binary helpers and add godoc
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-26 09:15:02 -07:00
Akshat Kumar
7a9fbec5fb Add logging binary support when terminal is true
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.

Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.

Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-25 17:28:29 -07:00
Brian Goff
d7b9cb0019 shim: move event context timeout to publsher
Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.

The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.

Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
`attempt*time.Second`.

In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches `maxRequeue`
(which is currently 5).

This change moves the timeout to the publisher so each send attempt gets
its own timeout.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-20 17:51:10 -07:00
Akihiro Suda
fd99b6566b
decrease log level of cgroup2 ToggleController error when running in UserNS
Fix #4312

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-24 18:15:16 +09:00
Akihiro Suda
f1a469a035
shim v2 runc: propagate options.Root to Cleanup
Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.

Fix #4326

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-17 19:06:36 +09:00
Akihiro Suda
2f601013e6 cgroup2: implement containerd.events.TaskOOM event
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
  (host)$ sudo swapoff -a
  (host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
  (container)$ sh -c 'VAR=$(seq 1 100000000)'

An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-01 14:00:13 +09:00
Tobias Klauser
a9bd451ab4 Avoid duplicate imports of github.com/gogo/protobuf/types
Re-use the import aliased as `ptypes`.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2020-03-10 09:41:03 +01:00
Ted Yu
a687d3a36d Check error return from json.Unmarshal
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2020-03-05 13:38:08 -05:00
Maksym Pavlenko
4d242818bf
Merge pull request #4053 from AkihiroSuda/vendor-grpc-20200225
vendor protobuf & grpc (GoGoProtoPackageIsVersion3)
2020-02-27 11:59:59 -08:00
Phil Estes
669f516b0e
Merge pull request #4062 from tedyu/start-shim-defer
Use named error return for service#StartShim
2020-02-27 13:23:31 -05:00
Ted Yu
f8ade8debd Use named error return for service#StartShim
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-02-27 06:18:05 -08:00
Ted Yu
4105135e36 fix killall when use pidnamespace
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-02-26 20:56:49 -08:00
Akihiro Suda
8e448bb279 vendor protobuf & grpc
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-02-26 10:57:05 +09:00
Seth Pellegrino
66508589d3 fix: eventfd leak for v2 runtime with v1 cgroups
There's no OOM monitoring for the v2 cgroups yet, so it seems unlikely
that there was a leak in that case.

Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
2020-01-13 10:49:11 -08:00
Seth Pellegrino
9456040acb fix: eventfd leak
Only start watching the cgroup for OOMs when the first process starts
instead of on every process.

Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
2020-01-13 10:39:54 -08:00
Erik Sipsma
fbd46d7094
runtime v2: Close platform in runc shim's Shutdown method.
Previously, the platform was closed as part of the Delete method when the
process was an init for a task and there were no more tasks after its deletion.
This can create problems if another task is created within the shim right after
the delete runs, which results in the platform being closed but the shim
continuing to run.

This change moves closing the platform to the Shutdown method after the shim's
context is canceled, which ensures the platform is only closed once the shim
is sure its done servicing containers.

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
2019-12-19 09:47:40 -05:00
Akihiro Suda
b02e20f12e cgroup2: enable controllers automatically
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-12-12 02:56:51 +09:00
Akihiro Suda
8f870c233f support cgroup2
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-12-12 02:56:51 +09:00
Michael Crosby
f8cca26f3c Handle large output in v2 shim with TTY
Reized the I/O buffers to align with the size of the kernel buffers with fifos
and move the close aspect of the console to key off of the stdin closing.

Fixes #3738

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-11 15:42:05 -04:00
Michael Crosby
6cf031e1e4 Pass ttrpc address to shim via env
Because of the way go handles flags, passing a flag that is not defined
will cause an error. In our case, if we kept this as a flag, then
third-party shims would break when they see this new flag.  To fix this,
I moved this new configuration option to an env var.  We should use env
vars from here on out to avoid breaking shim compat.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-22 20:37:49 +00:00
Kevin Parsons
d7e1b25384 Allow explicit configuration of TTRPC address
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.

As part of this change, a new configuration section is added for TTRPC
listener options.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2019-08-22 00:56:27 -07:00
Phil Estes
640860a042
Merge pull request #3559 from fuweid/avoid-read-config
runtime: only check killall for init process
2019-08-20 13:08:55 -04:00
Wei Fu
1073868e5e runtime: only check killall for init process
When containerd-shim does reaper, the most processes are not init
process. Since json.Decode consumes more CPU resource, we should check
killall option for init process only.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2019-08-20 19:18:34 +08:00
Michael Crosby
0d27d8f4f2 Unifi reaper logic into package
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-16 13:55:05 +00:00
Maksym Pavlenko
ef7f46eb7b Fix linter errors
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-07-14 20:49:40 -07:00
Michael Crosby
6601b406b7 Refactor runtime code for code sharing
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-07-08 11:47:53 -04:00
Michael Crosby
7dfc605fc6 Set shim OOM scores to +1 containerd daemon score
This changes the shim's OOM score from a static max killable of -999 to
be +1 of the containerd daemon's score.  This should allow the shim's to
be killed first in an OOM condition but leave the daemon alone for a bit
to help cleanup and manage the containers during this situation.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-27 11:14:14 -04:00
Michael Crosby
1a8df3f237 Reserve exec id to prevent race
ref #2820

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-21 14:52:44 -04:00
Lantao Liu
48b81e872c Do not return error when rootfs already exists.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-05-22 15:57:19 -07:00
Michael Crosby
fe6a2b03ed Add shim cgroup support for v2 runtimes
Closes #3198

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-05-20 16:04:06 +00:00
Michael Crosby
57fbb16234
Merge pull request #3149 from lifubang/pidnamespace
fix killall when use pidnamespace
2019-05-09 14:28:44 -04:00
Michael Crosby
19af235051
Merge pull request #3148 from masters-of-cats/wip-rootless-containerd
Skip rootfs unmount when no mounts are provided
2019-05-07 10:39:02 -04:00
Michael Crosby
ae87730ad2 Improve shim shutdown logic
Shims no longer call `os.Exit` but close the context on shutdown so that
events and other resources have hit the `defer`s.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-10 18:17:07 -04:00
Michael Crosby
a6f587e4c4 Use ttrpc to publish runtime v2 events
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-04-09 14:38:50 -04:00
Georgi Sabev
c0f0b21314 Apply PR feedback
* Rootfs dir is created during container creation not during bundle
  creation
* Add support for v2
* UnmountAll is a no-op when the path to unmount (i.e. the rootfs dir)
  does not exist or is invalid

Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2019-04-04 18:40:30 +03:00