Commit Graph

542 Commits

Author SHA1 Message Date
Kevin Parsons
c9afc4250a Fix error checking when resolving shim binary path
Previously a typo was introduced that caused the wrong error to be
checked against when calling exec.LookPath. This had the effect that
containerd would never locate the shim binary if it was in the same
directory as containerd's binary, but not in PATH.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2021-03-08 16:24:19 -08:00
Maksym Pavlenko
134f7a7370
Merge pull request #5007 from fidencio/wip/allow-shimv2-to-also-be-loaded-from-an-arbitrary-path
v2, util: Take the full binary path when starting the shimv2 process
2021-03-01 14:52:27 -08:00
Derek McGowan
10bbd1a462
Merge pull request #5051 from wzshiming/fix/missing-close
Fix missing close
2021-02-26 14:59:43 -08:00
Derek McGowan
9884730e5c
Merge pull request #5069 from AkihiroSuda/restart-fast
restart: skip Sleep() for the first iteration of the reconcilation
2021-02-25 16:37:53 -08:00
Akihiro Suda
6ab6eaa790
restart: skip Sleep() for the first iteration of the reconcilation
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-25 13:30:38 +09:00
Akihiro Suda
b23dc1131e
restart: parallelize reconcile()
The only shared variable `m.client` is thread-safe, so we can safely
parallelize the loops.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-02-25 13:30:00 +09:00
Shiming Zhang
05ef2fe2fb Fix missing close
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-02-18 13:21:42 +08:00
Fabiano Fidêncio
d80dbdae68 v2, util: Take the full binary path when starting the shimv2 process
The current code simply ignores the full binary path when starting the
shimv2 process, and instead fallbacks to a binary in the path, and this
is problematic (and confusing) for those using CRI-O, which has this
bits vendored.

The reason it's problematic with CRI-O is because the user can simply
set the full binary path and, instead of having that executed, CRI-O
will simply fail to create the container unless that binary is part of
the path, which may not be case in a few different scenarios (testing
being the most common one).

Fixes: #5006

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-02-05 13:35:22 +01:00
IceberGu
b458583b76
runtime: fix shutdown runc v2 service
Signed-off-by: IceberGu <wei.cai-nat@daocloud.io>
2021-02-02 15:36:49 +08:00
Phil Estes
49c5c14879
Merge pull request #4906 from payall4u/bugfix/fix-open-shim-fifo
bugfix: change the flag of open log fifo to avoid containerd hang on syscall open
2021-02-01 09:01:38 -05:00
payall4u
957fa3379d change flag from RDONLY to RDWR and close the fifo correct
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-01-31 19:00:42 +08:00
Aditi Sharma
1423e9199d Update gogo/protobuf to v1.3.2
bump version 1.3.2 for gogo/protobuf due to CVE-2021-3121 discovered
in gogo/protobuf version 1.3.1, CVE has been fixed in 1.3.2

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2021-01-28 12:57:50 +00:00
Maksim An
ddb5e1651a Enhance logging driver and ctr tasks to support windows
Signed-off-by: Maksim An <maksiman@microsoft.com>
2021-01-21 12:17:32 -08:00
Wei Fu
846cb963cc runtime/v2: should use defer ctx to cleanup
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-01-11 23:22:38 +08:00
Maksym Pavlenko
c1b01eabc0 Add copyright header to proto files
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-01-05 10:44:07 -08:00
Michael Crosby
dc207b654d
Merge pull request #4860 from masters-of-cats/pr-process-not-found-err
Return GRPC not found error instead of plain one
2020-12-21 10:25:11 -05:00
Georgi Sabev
7451dd1ed1 Return GRPC not found error instead of plain one
When the shim returns a plain error when a process does not exist,
the server is unable to recognise its GRPC status code and assumes
UnknownError. This is awkward for containerd client users as they are
unable to recognise the actual reason for the error.

When the shim returns a NotFound GRPC error, it is properly translated
by the server and clients receive a proper NotFound error instead of
Unknown

Please note that we (CF Garden) would like to have the eventual fix backported to 1.4 as well.

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2020-12-18 15:33:48 +02:00
Phil Estes
070b698449
Merge pull request #4845 from skaegi/oom_score-max
Add bounds on max oom_score_adj value for AdjustOOMScore
2020-12-17 16:22:46 -05:00
Simon Kaegi
da2fd657ab Add bounds on max oom_score_adj value for AdjustOOMScore
oom_score_adj must be in the range -1000 to 1000. In AdjustOOMScore if containerd's score is already at the maximum value we should set that value for the shim instead of trying to set 1001 which is invalid.

Signed-off-by: Simon Kaegi <simon_kaegi@ca.ibm.com>
2020-12-14 15:09:24 -05:00
Akihiro Suda
0356d5d4b2
restart: allow passing existing log URI object
The new function `WithLogURI(uri *url.URL)` replaces `WithBinaryLogURI(binary string, args map[string]string)`
so as to allow passing an existring URI object.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-12-12 05:11:03 +09:00
Akihiro Suda
7126310a09
Merge pull request #4784 from fuweid/fix-4769
runtime: should not send duplicate task exit event
2020-12-02 15:26:57 +09:00
Wei Fu
faec5d4ffd runtime: should not send duplicate task exit event
If the shim has been killed and ttrpc connection has been
closed, the shimErr will not be nil. For this case, the event
subscriber, like moby/moby, might have received the exit or delete
events. Just in case, we should allow ttrpc-callback-on-close to
send the exit and delete events again. And the exit status will
depend on result of shimV2.Delete.

If not, the shim has been delivered the exit and delete events.
So we should remove the task record and prevent duplicate events from
ttrpc-callback-on-close.

Fix: #4769

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-12-01 21:54:04 +08:00
Derek McGowan
4a4bb851f5
Merge pull request from GHSA-36xw-fx78-c5r4
Use path based unix socket for shims
2020-11-30 10:32:18 -08:00
Maksym Pavlenko
0d4734655f
Merge pull request #4647 from katiewasnothere/task_update_annotations_upstream
Add annotations to task update request api
2020-11-18 14:44:19 -08:00
Samuel Karp
126b35ca43
containerd-shim: use path-based unix socket
This allows filesystem-based ACLs for configuring access to the socket
of a shim.

Ported from Michael Crosby's similar patch for v2 shims.

Signed-off-by: Samuel Karp <skarp@amazon.com>
2020-11-11 11:47:47 -08:00
Michael Crosby
bd908acabd
Use path based unix socket for shims
This allows filesystem based ACLs for configuring access to the socket of a
shim.

Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Michael Crosby <michael.crosby@apple.com>
2020-11-11 11:47:46 -08:00
Kathryn Baldauf
95ba6e9f75 Add annotations to task update request api
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
2020-11-09 14:13:33 -08:00
Maksym Pavlenko
4da306e1e9 Fix panic in shim not logged
Fix #4274
Carry #4298

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-10-26 09:05:47 -07:00
Giuseppe Capizzi
8eda32e107 Check if a process exists before returning it
Fixes #4632.

Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Danail Branekov <danailster@gmail.com>
2020-10-22 16:50:14 +03:00
Akihiro Suda
915263f269
Merge pull request #4502 from akshat-kmr/master
Add logging binary support when terminal is true
2020-10-08 12:14:39 +09:00
Maksym Pavlenko
c59d1cd5b0 Fix linter issues
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-10-07 15:42:01 -07:00
Phil Estes
68d97331be
Merge pull request #4538 from fuweid/update-shim-cleanup
runtime/v2: cleanup dead shim before delete bundle
2020-09-21 13:32:40 -04:00
Wei Fu
4b05d03903 runtime/v2: cleanup dead shim before delete bundle
The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.

The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.

And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-09-20 11:24:31 +08:00
Akshat Kumar
61da6986c0 Cleanup open pipes if logging binary fails to start
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-09-10 20:06:51 -07:00
Brian Goff
dab7bd0c45 Always consume shim logs
These fifos fill up if unconsumed, so always consume them.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-10 10:23:29 -07:00
Brian Goff
5f9d15eaac shimv1: downgrade poroccess missing log to debug
This `Info` log shows up for all exec processes that use the v1 shim
with Docker because Docker deletes the process once it receives the exit
event from containerd.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-01 10:31:41 -07:00
Akshat Kumar
4cc99e57a7 Remove unnecessary logging binary helpers and add godoc
Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-26 09:15:02 -07:00
Akshat Kumar
7a9fbec5fb Add logging binary support when terminal is true
Currently the shims only support starting the logging binary process if the
io.Creator Config does not specify Terminal: true. This means that the program
using containerd will only be able to specify FIFO io when Terminal: true,
rather than allowing the shim to fork the logging binary process. Hence,
containerd consumers face an inconsistent behavior regarding logging binary
management depending on the Terminal option.

Allowing the shim to fork the logging binary process will introduce consistency
between the running container and the logging process. Otherwise, the logging
process may die if its parent process dies whereas the container will keep
running, resulting in the loss of container logs.

Signed-off-by: Akshat Kumar <kshtku@amazon.com>
2020-08-25 17:28:29 -07:00
Wei Fu
73b1449278 runtime: ignore ErrNotExist when remove rootfs
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-08-12 20:04:50 +08:00
Brian Goff
d7b9cb0019 shim: move event context timeout to publsher
Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.

The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.

Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
`attempt*time.Second`.

In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches `maxRequeue`
(which is currently 5).

This change moves the timeout to the publisher so each send attempt gets
its own timeout.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-07-20 17:51:10 -07:00
Akihiro Suda
fd99b6566b
decrease log level of cgroup2 ToggleController error when running in UserNS
Fix #4312

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-24 18:15:16 +09:00
Phil Estes
fb80a49ec1
Merge pull request #4327 from AkihiroSuda/fix-4326
shim v2 runc: propagate options.Root to Cleanup
2020-06-17 09:23:53 -04:00
Akihiro Suda
f1a469a035
shim v2 runc: propagate options.Root to Cleanup
Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.

Fix #4326

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-17 19:06:36 +09:00
Wei Fu
d656fa38ca restart plugin: support binary log uri
Introduce LogURIGenerator helper function in cio package. It is used in
the restart options, like WithBinaryLogURI and WithFileLogURI.

And restart.LogPathLabel might be used in production and work well. In
order to reduce breaking change, the LogPathLabel is still recognized if
new LogURILabel is not set. In next release 1.5, the LogPathLabel will
be removed.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-06-10 00:09:24 +08:00
Michael Crosby
7ce8a9d7d3
Merge pull request #4204 from ashrayjain/aj/add-kill-retry
Make killing shims more resilient
2020-06-03 11:10:43 -04:00
Ashray Jain
3e95727f39 Make killing shims more resilient
Currently, we send a single SIGKILL to the shim process
once and then we spin in a loop where we use kill(pid, 0)
to detect when the pid has disappeared completely.

Unfortunately, this has a race condition since pids can be reused causing us
to spin in an infinite loop when that happens.

This adds a timeout to this loop which logs a warning and exits the
infinite loop.

Signed-off-by: Ashray Jain <ashrayj@palantir.com>
2020-06-03 12:57:08 +01:00
Akihiro Suda
2f601013e6 cgroup2: implement containerd.events.TaskOOM event
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
  (host)$ sudo swapoff -a
  (host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
  (container)$ sh -c 'VAR=$(seq 1 100000000)'

An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-06-01 14:00:13 +09:00
Sebastiaan van Stijn
dc92ad6520
Replace errors.Cause() with errors.Is()
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-08 14:36:45 +02:00
Sebastiaan van Stijn
1b66fecad3
Integrate sys.SetSubreaper, sys.GetSubreaper in sys/reaper package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-04 08:44:02 +02:00
Wei Fu
9687ba6315 test: TestRuntimeWithEmptyMaxEnvProcs should cleanup
TestRuntimeWithEmptyMaxEnvProcs should restore the GoMaxProcs after
test so that the temporary change of GoMaxProcs will not impact other
case, like TestRuntimeWithNonEmptyMaxEnvProcs.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-04-23 22:09:10 +08:00
Wei Fu
0116352e1b runtime: ignore ttrpc.ErrClosed when delete task
For some reason, shimv2 process doesn't exist. The ttrpc doesn't detect
the connection closed by server until delete task. For this case, we
should ignore the ttrpc.ErrClosed and let task manager handle the
cleanup.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-04-20 23:34:49 +08:00
Michael Crosby
2ed8d12bb0
Merge pull request #3845 from fahedouch/v2_shim_test
v2 runtime shim test
2020-04-13 12:26:05 -04:00
Maksym Pavlenko
0caa233158 Rework shim logger shutdown process
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2020-04-07 12:42:04 -07:00
Michael Crosby
649f2aac66 add -v to shim binaries
Request came from a slack message that shims do not output their versions making
it hard for users and operators to know what version of a shim they have on the
system.  This adds a `-v` flag to the shims so that users can see if a shim is
in sync with containerd or what versions of shims that they are running.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2020-03-17 13:23:06 -04:00
Maksym Pavlenko
2532bdf43f
Merge pull request #4100 from lifubang/publisher
fix dial error when clean up a dead shim
2020-03-14 15:19:48 -07:00
lifubang
488d6194f2 fix dial error when clean up a dead shim
Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-03-12 10:57:55 +08:00
Kir Kolyshkin
6e638ad27a Nit: fix use of bufio.Scanner.Err
The Err() method should be called after the Scan() loop, not inside it.

Found by: git grep -A3 -F '.Scan()'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-11 19:36:21 -07:00
Tobias Klauser
a9bd451ab4 Avoid duplicate imports of github.com/gogo/protobuf/types
Re-use the import aliased as `ptypes`.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2020-03-10 09:41:03 +01:00
zyu
e3ab8bda60 Avoid allocating slice for finding Process
Signed-off-by: zyu <yuzhihong@gmail.com>
2020-03-06 09:51:26 -08:00
Ted Yu
a687d3a36d Check error return from json.Unmarshal
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2020-03-05 13:38:08 -05:00
Maksym Pavlenko
4d242818bf
Merge pull request #4053 from AkihiroSuda/vendor-grpc-20200225
vendor protobuf & grpc (GoGoProtoPackageIsVersion3)
2020-02-27 11:59:59 -08:00
Phil Estes
669f516b0e
Merge pull request #4062 from tedyu/start-shim-defer
Use named error return for service#StartShim
2020-02-27 13:23:31 -05:00
Ted Yu
f8ade8debd Use named error return for service#StartShim
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-02-27 06:18:05 -08:00
Derek McGowan
06b284026d
Merge pull request #4063 from tedyu/namespace-path
fix killall when use pidnamespace
2020-02-26 23:08:31 -08:00
Ted Yu
4105135e36 fix killall when use pidnamespace
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-02-26 20:56:49 -08:00
Phil Estes
ebec675a8d
Merge pull request #3802 from vladimiroff/unify-dialers
Unify dialer implementations
2020-02-26 16:54:22 -05:00
Kiril Vladimiroff
4dd75be2b9
Unify dialer implementations
Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own

Closes #3471.

Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>
2020-02-26 23:29:04 +02:00
Akihiro Suda
8e448bb279 vendor protobuf & grpc
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-02-26 10:57:05 +09:00
Wei Fu
18e581dd91 bugfix: cleanup dangling shim by brand new context
When there is timeout or cancel for create container, killShim will fail
because of canceled context. The shim will be dangling and unmanageable.

Need to use new context to do cleanup.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-02-21 16:49:58 +08:00
Li Yuxuan
84464b801f v2: Cancel shim log ctx when ttrpc is closed
The background context aovids shim blocking when the ctx is cancelled
unexpectedly during shim start. But if the shim exits unexpectedly
before opening the pipe, the fd will never be closed.
`onCloseWithShimLog` makes sure that the shim log fd is closed properly
once the shim disconnects.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2020-02-20 23:20:10 +08:00
fahedouch
486d33631e test runtime v2 CPU settings
Signed-off-by: fahedouch <fahed.dorgaa@gmail.com>
2020-01-14 18:23:54 +01:00
Seth Pellegrino
66508589d3 fix: eventfd leak for v2 runtime with v1 cgroups
There's no OOM monitoring for the v2 cgroups yet, so it seems unlikely
that there was a leak in that case.

Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
2020-01-13 10:49:11 -08:00
Seth Pellegrino
9456040acb fix: eventfd leak
Only start watching the cgroup for OOMs when the first process starts
instead of on every process.

Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
2020-01-13 10:39:54 -08:00
Li Yuxuan
1fb1d93212 v2: Fix missing ns when openShimLog on windows
Related to
https://github.com/containerd/containerd/pull/3921#discussion_r363046745

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2020-01-05 19:42:33 +08:00
Li Yuxuan
d82fa43193 v2: Call shim.Delete at first when create is failed
If the context is cancelled during `shim.Create()`, such as the client
disconnects unexpectedly. The created shim will never be deleted.
What's more, if the context is cancelled during `openShimLog()`, the
fifo will be closed and block the shim output.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2019-12-28 00:02:11 +08:00
Erik Sipsma
fbd46d7094
runtime v2: Close platform in runc shim's Shutdown method.
Previously, the platform was closed as part of the Delete method when the
process was an init for a task and there were no more tasks after its deletion.
This can create problems if another task is created within the shim right after
the delete runs, which results in the platform being closed but the shim
continuing to run.

This change moves closing the platform to the Shutdown method after the shim's
context is canceled, which ensures the platform is only closed once the shim
is sure its done servicing containers.

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
2019-12-19 09:47:40 -05:00
Akihiro Suda
b02e20f12e cgroup2: enable controllers automatically
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-12-12 02:56:51 +09:00
Akihiro Suda
8f870c233f support cgroup2
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-12-12 02:56:51 +09:00
Michael Crosby
f8cca26f3c Handle large output in v2 shim with TTY
Reized the I/O buffers to align with the size of the kernel buffers with fifos
and move the close aspect of the console to key off of the stdin closing.

Fixes #3738

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-11 15:42:05 -04:00
Lantao Liu
ffcb1cc9be Fix delete error code on the containerd daemon side.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-10-09 00:28:51 -07:00
Lantao Liu
06be794cb2 Fix shim delete error code.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-10-07 23:21:57 -07:00
Derek McGowan
0b224ac7d6
Update metadata interfaces for containers and leases
Add more thorough dirty checking across all types which
may be deleted and hold references.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-09-23 15:27:39 -07:00
Kathryn Baldauf
b4211d94e2 fail on file not found for shim reconnect on containerd restart
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
2019-09-17 14:49:29 -07:00
Derek McGowan
b039c39186
Merge pull request #3564 from tiborvass/move-cgroups-dep-to-namespaces-pkg
runtime/opts: move WithNamespaceCgroupDeletion from containerd to its own package
2019-09-03 10:38:53 -07:00
Kathryn Baldauf
2d8a65b1b2 Export shim publisher functions
- Our out of tree shim would like to publish events with ttrpc. These
functions should be exposed so our shim doesn't need to reimplement
publisher logic.

Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
2019-08-27 17:15:15 -07:00
Tibor Vass
6624a70d92 runtime/opts: move WithNamespaceCgroupDeletion from containerd to its own package
The cgroup dependency brings in quite a lot only for WithNamespaceCgroupDeletion,
which is a namespaces.DeleteOpt.

Signed-off-by: Tibor Vass <tibor@docker.com>
2019-08-27 19:02:55 +00:00
chentanjun
8266a3c5e7 fix-up spelling mistake
Signed-off-by: chentanjun <2799194073@qq.com>
2019-08-27 13:45:41 +08:00
Michael Crosby
6cf031e1e4 Pass ttrpc address to shim via env
Because of the way go handles flags, passing a flag that is not defined
will cause an error. In our case, if we kept this as a flag, then
third-party shims would break when they see this new flag.  To fix this,
I moved this new configuration option to an env var.  We should use env
vars from here on out to avoid breaking shim compat.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-22 20:37:49 +00:00
Kevin Parsons
d7e1b25384 Allow explicit configuration of TTRPC address
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.

As part of this change, a new configuration section is added for TTRPC
listener options.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2019-08-22 00:56:27 -07:00
Phil Estes
640860a042
Merge pull request #3559 from fuweid/avoid-read-config
runtime: only check killall for init process
2019-08-20 13:08:55 -04:00
Michael Crosby
08061c7c3c
Merge pull request #3540 from crosbymichael/shim-hang
Use non-blocking send and retry for exit events
2019-08-20 09:31:21 -04:00
Wei Fu
1073868e5e runtime: only check killall for init process
When containerd-shim does reaper, the most processes are not init
process. Since json.Decode consumes more CPU resource, we should check
killall option for init process only.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2019-08-20 19:18:34 +08:00
Phil Estes
fc9335d75c
Merge pull request #3459 from crosbymichael/timeout-config
Allow timeouts to be configured in config
2019-08-19 13:16:43 -04:00
Li Yuxuan
04caf1fc4e Ignore fifo error when using v2 multi-container shim
When using a multi-container shim, the fifo of the 2nd to Nth container
will not be opened when the ctx is done. This will cause an
`ErrReadClosed` that can be ignored.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2019-08-17 09:40:08 +08:00
Michael Crosby
0d27d8f4f2 Unifi reaper logic into package
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-16 13:55:05 +00:00
Shukui Yang
bb4c92c773 Fix shim hung
shim.Reap and shim.Default.Wait may deadlock, use Monitor.Notify
to fix this issue.

Signed-off-by: Shukui Yang <keloyangsk@gmail.com>
2019-08-16 13:55:05 +00:00
Michael Crosby
2e8ea9fd6b Allow timeouts to be configured in config
This adds a singleton `timeout` package that will allow services and user
to configure timeouts in the daemon.  When a service wants to use a
timeout, it should declare a const and register it's default value
inside an `init()` function for that package.  When the default config
is generated, we can use the `timeout` package to provide the available
timeout keys so that a user knows that they can configure.

These show up in the config as follows:

```toml
[timeouts]
  "io.containerd.timeout.shim.cleanup" = 5
  "io.containerd.timeout.shim.load" = 5
  "io.containerd.timeout.shim.shutdown" = 3
  "io.containerd.timeout.task.state" = 2

```

Timeouts in the config are specified in seconds.

Timeouts are very hard to get right and giving this power to the user to
configure things is a huge improvement.  Machines can be faster and
slower and depending on the CPU or load of the machine, a timeout may
need to be adjusted.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-13 17:36:32 +00:00
Akihiro Suda
225cc7d5bd
Merge pull request #3494 from jterry75/remove_v2
Completely remove Windows v2 in-tree shim
2019-08-07 02:19:12 +09:00
Li Yuxuan
08483d18ad v2: Close ttrpc connection when Delete()
This avoids potential socket leak when the connected v2 shim of runtime
serving multiple containers.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2019-08-06 20:35:59 +08:00
Justin Terry (VM)
4b5dfaee13 Completely remove Windows v2 in-tree shim
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
2019-08-05 16:49:56 -07:00
Derek McGowan
ac1cb6d5d4
Merge pull request #3467 from kevpar/dial-pipe-err
Improve error return from AnonDialer on Windows
2019-08-01 15:41:54 -07:00
Kevin Parsons
daf12cd194 Improve error return from AnonDialer on Windows
AnonDialer will now return a "not found" error if the pipe is not found
before the timeout is reached. If the pipe exists but the timeout is
reached while attempting to connect, the timeout error will still be
returned.

This will allow the error handling logic to work properly when
connecting to the shim log pipe. An error message is only logged if the
error is not "not found", so now log noise from log pipes that were
never intended to be created by the shim will be hidden.

This change also cleans up the control flow for AnonDialer on Windows.
The new code should be more easily readable, but the only semantic
change is the error return value change.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2019-07-30 17:20:37 -07:00
Michael Crosby
eb4b3e8772 Fast path getting pid from task
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-07-26 17:48:00 +00:00
dzzg
c27e48d666
fix mis-spelling in client.go
Signed-off-by: dzzg <zhengguang.zhu@daocloud.io>
2019-07-26 13:33:04 +08:00
Akihiro Suda
fab016c7a1 runtime/v1/linux: ignore ErrCgroupDeleted in Task.Start
Fix a Rootless Docker-in-Docker issue on Fedora 30: https://github.com/docker-library/docker/pull/165#issuecomment-511717143
Related: #1598

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-07-17 12:19:15 +09:00
Maksym Pavlenko
ef7f46eb7b Fix linter errors
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-07-14 20:49:40 -07:00
Michael Crosby
6601b406b7 Refactor runtime code for code sharing
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-07-08 11:47:53 -04:00
Phil Estes
2aa8780ce6
Merge pull request #3393 from lifupan/fix_deadshim
shimv2: remove the dead task from runtime task list
2019-07-08 11:42:55 -04:00
lifupan
ec8d9d3d7a shimv2: remove the dead task from runtime task list
When shimv2 dead, the container would be cleanup, but
the corresponding runtime task still existed in runtime
task lists, it should be deleted too.

Signed-off-by: lifupan <lifupan@gmail.com>
2019-07-04 15:51:03 +08:00
Derek McGowan
041d8d7051
Merge pull request #3366 from crosbymichael/exec-pid
Robust pid locking for shim processes
2019-06-29 15:36:51 +08:00
Michael Crosby
7dfc605fc6 Set shim OOM scores to +1 containerd daemon score
This changes the shim's OOM score from a static max killable of -999 to
be +1 of the containerd daemon's score.  This should allow the shim's to
be killed first in an OOM condition but leave the daemon alone for a bit
to help cleanup and manage the containers during this situation.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-27 11:14:14 -04:00
Michael Crosby
719a2c594e Robust pid locking for shim processes
Closes #2832

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-26 11:43:57 -04:00
Phil Estes
287582585f
Merge pull request #3365 from crosbymichael/exec-lk
Reserve exec id to prevent race
2019-06-25 08:59:41 +08:00
Maksym Pavlenko
174c4907d0 Fix shim's file IO logging
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-06-24 13:21:41 -07:00
Michael Crosby
1a8df3f237 Reserve exec id to prevent race
ref #2820

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-21 14:52:44 -04:00
Michael Crosby
245052243d Add timeout for I/O waitgroups
Closes #3286

This and a combination of a couple Docker changes are needed to fully
resolve the issue on the Docker side.  However, this ensures that after
processes exit, we still leave some time for the I/O to fully flush
before closing.  Without this timeout, the delete methods would block
forever.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-20 16:13:51 -04:00
Wei Fu
111b082e20
Merge pull request #3356 from mxpv/binary-io-path
BinaryIO/LogFile creator bug fixing
2019-06-20 10:25:47 +08:00
Maksym Pavlenko
fbf96d302a Fix path in LogFile creator
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-06-19 16:53:33 -07:00
Maksym Pavlenko
5e0d793801 Fix bugs in BinaryIO creator
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-06-19 11:15:17 -07:00
Ace-Tang
95f9bbf18b Add timeout in load shim v2
add timeout in connect shim v2 avoid starting containerd hang

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-06-19 13:10:18 +08:00
Maksym Pavlenko
bca5667362 Make newBinaryIO public
Allow third-party runtime implementations to reuse NewBinaryIO
in order to support pluggable shim logging binary protocol.

Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-06-12 16:22:10 -07:00
Michael Crosby
42f4bb98ac
Merge pull request #3311 from jing-rui/shimlog
fix shim std logs not close after shim exit
2019-06-10 12:05:35 -04:00
Jing Rui
9e0cd529d3 fix shim std logs not close after shim exit
Signed-off-by: Jing Rui <jingrui@huawei.com>
2019-06-10 11:50:07 +08:00
Michael Crosby
ed308ea1e6 Unmount rootfs with separate Remove() in bundle
This ensures that a container does not have a mounted rootfs in the
bundle directory before RemoveAll is called.  Having the rootfs removed
first with a Remove ensures that the directory is not mounted and empty
before the bundle directory is removed.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-05 20:37:35 +00:00
Michael Crosby
7531c66d5a Ensure that the rootfs dir is created in the bundle
This fixes issues running gvisor on top of containerd without docker.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-06-03 19:56:19 +00:00
Danni Xia
bf24fb0cad Close file r.log after used to release resources.
Signed-off-by: Danni Xia <xiadanni1@huawei.com>
2019-06-04 06:41:38 +08:00
Lantao Liu
48b81e872c Do not return error when rootfs already exists.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-05-22 15:57:19 -07:00
Derek McGowan
c9c555cd71
Merge pull request #3226 from Ace-Tang/kill_shim_in_clean
runtime-v1: kill shim in exit handler
2019-05-22 11:56:40 -07:00
Derek McGowan
ec0b722083
Merge pull request #3292 from crosbymichael/shim-cgroup
Add shim cgroup support for v2 runtimes
2019-05-22 10:32:47 -07:00
Derek McGowan
30082abed3
Merge pull request #3293 from crosbymichael/atomic-delete
Improve atomic delete
2019-05-21 13:54:47 -07:00
Michael Crosby
bcb6c8db47
Merge pull request #3279 from mxpv/ttrpc
Add TTRPC client
2019-05-21 12:24:31 -04:00
Maksym Pavlenko
7f79fbb245 Move ttrpc client to pkg/ttrpcutil
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-05-20 16:44:49 -07:00
Ace-Tang
5b7a327c47 Improve atomic delete
skip hidden directories in load task, and return soon if path not exist
in atomicDelete

carry of #3233

Closes #3233

Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-05-20 20:13:35 +00:00
Michael Crosby
fe6a2b03ed Add shim cgroup support for v2 runtimes
Closes #3198

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-05-20 16:04:06 +00:00
Michael Crosby
90c6c1af43 Pass options on shim create for v2
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-05-17 21:02:23 +00:00
Maksym Pavlenko
7b06c9a1ce Add TTRPC client
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-05-13 21:05:07 -07:00
Justin Terry (VM)
5e962dd8ba Remove unused Resize method from initState
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
2019-05-13 12:35:22 -07:00
Li Yuxuan
66036d9206 v1: Respect the shim_debug flag when load tasks
Currently when we restart containerd it will load all tasks with shim
logs whether the `shim_debug` is set or not.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2019-05-13 23:51:16 +08:00
Derek McGowan
bc944553a8
Merge pull request #3206 from Random-Liu/cleanup-after-deadshim-v2
Cleanup dead v2 shim.
2019-05-10 11:56:57 -07:00
Michael Crosby
57fbb16234
Merge pull request #3149 from lifubang/pidnamespace
fix killall when use pidnamespace
2019-05-09 14:28:44 -04:00
Li Yuxuan
cf6e008542 Fix fd leak of shim log
Open shim v2 log with the flag `O_RDWR` will cause the `Read()` block
forever even if the pipe has been closed on the shim side. Then the
`io.Copy()` would never return and lead to a fd leak.
Fix typo when closing shim v1 log which causes the `stdouLog` leak.
Update `numPipes` function in test case to get the opened FIFO
correctly.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
2019-05-09 20:21:57 +08:00
Lantao Liu
660554d671 Fix error handling for task deletion.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-05-07 08:48:54 -07:00
Michael Crosby
5cf1356c5c
Merge pull request #3255 from dvrkps/usecancel
Use cancel on errors
2019-05-07 11:40:35 -04:00
Phil Estes
836cf53e40
Merge pull request #3244 from Random-Liu/fix-container-cleanup
Return NotFound error for kill and delete in deleted state.
2019-05-07 16:49:45 +02:00
Michael Crosby
19af235051
Merge pull request #3148 from masters-of-cats/wip-rootless-containerd
Skip rootfs unmount when no mounts are provided
2019-05-07 10:39:02 -04:00
Lantao Liu
5c9811ded0 Cleanup dead v2 shim.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-04-30 13:45:01 -07:00
Davor Kapsa
38e3696574 Use cancel on errors
Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>
2019-04-30 21:11:34 +02:00
Justin Terry (VM)
969035bcbd Stop logging error on v2 multi shim log failure
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
2019-04-30 11:20:58 -07:00
Sebastiaan van Stijn
8c5779c32b
bump containerd/ttrpc 699c4e40d1e7416e08bf7019c7ce2e9beced4636
full diff: f02858b145...699c4e40d1

- containerd/ttrpc#33 Fix returns error message
- containerd/ttrpc#35 Make onclose an option

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-04-27 15:30:18 -07:00
Lantao Liu
dff7456804 Return NotFound error for kill and delete in deleted state.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-04-26 15:17:18 -07:00