Commit Graph

72 Commits

Author SHA1 Message Date
Danny Canter
0bc9633414 runtime/v2: net.Dial gRPC shim sockets before trying grpc
This is mostly to workaround an issue with gRPC based shims after containerd
restart. If a shim dies while containerd is also down/restarting, on reboot
grpc.DialContext with our current set of DialOptions will make us wait for
100 seconds per shim even if the socket no longer exists or has no listener.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-11-30 19:37:43 -08:00
Derek McGowan
5fdf55e493
Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Maksym Pavlenko
f515cd5c55
Reorder fields when writing bootstrap params
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:06 -07:00
Maksym Pavlenko
f76eaf5a6b
Fix 'not a directory' error when restoring bootstrap.json
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:05 -07:00
Maksym Pavlenko
cf75cfa32c
Add more logs around shim restore
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:04 -07:00
Maksym Pavlenko
8061cb0237
Save bootstrap.json instead of address file
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:03 -07:00
Maksym Pavlenko
7a2d801d62
Expose shim instance version
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:02 -07:00
Maksym Pavlenko
f66c46806a
Bridge task service v2
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-19 12:29:01 -07:00
Abel Feng
7bca70c0c3 sandbox: do not call Connect when loadShim
The ShimManager.Start() will call loadShim() to get the existing shim if SandboxID
is specified for a container, but shimTask.PID() is called in loadShim,
which will call Connect() of Task API with the ID of a task that is not
created yet(containerd is getting the shim and Task API address to call
Create, so the task is not created yet).
In this commit we change the logic of loadShim() to get the shim without calling
Connect() of the not created container ID.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 21:17:50 +08:00
Derek McGowan
508aa3a1ef
Move to use github.com/containerd/log
Add github.com/containerd/log to go.mod

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Jin Dong
cd8c8ae4bc Remove hashicorp/go-multierror
Signed-off-by: Jin Dong <jin.dong@databricks.com>
2023-08-20 17:59:45 -07:00
Wei Fu
601699a184 integration: add ShouldRetryShutdown case based on #7496
Since the moby/moby can't handle duplicate exit event well, it's hard
for containerd to retry shutdown if there is error, like context
canceled.

In order to prevent from regression like #4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-08-11 17:43:51 +08:00
Derek McGowan
dba6f9db18
Add version to shim protocol
Document environment variables and test shim start response parsing.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-02-27 22:58:47 -08:00
Danny Canter
4728800abc runtime/v2: Get rid of last logrus.Fields usage
https://github.com/containerd/containerd/pull/8143 added an alias for
logrus.Fields and moved over most usages to this alias, but there was
one straggler.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-02-20 18:29:56 -08:00
Danny Canter
4278fbbc7e runtime/v2: Call onCloseWithShimLog for grpc shims
We pass in a callback using the ttrpc.WithOnClose functionality
for shims that use ttrpc, but with the newly added ability to use
GRPC for shims this was left as a follow-up. It doesn't seem like
grpc-go has anything similar so some options (that I could see) are:

This change introduces a new grpcConn wrapper type for the connection
that exposes a method to get notified when the users callback has run,
the same in functionality as TTRPC's `UserOnCloseWait`. The callback
gets passed in in a new `grpcDialContext` function that will:

1. Dial the connection as normal
2. Spin off a goroutine that will monitor the connections state
until it transitions to idle or shutdown and will then run the
callback.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-02-20 18:25:53 -08:00
Danny Canter
979a744122 runtime/v2: Log BootstrapParams
Recent work added the ability to use grpc for shims, it'd be nice to
have a debug (or info perhaps) log to show what protocol and addr the
shim sent over.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-02-16 17:21:27 -08:00
Maksym Pavlenko
8ef298d863 Add transport credentials GRPC opt
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-02-10 22:01:35 -08:00
Maksym Pavlenko
a82e37a5a2 Add shim bootstrap params
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-02-10 22:01:35 -08:00
Maksym Pavlenko
fc2e761e26 Initial GRPC client support
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-02-10 22:01:35 -08:00
Maksym Pavlenko
9e5c207e4c Wire up client bridges
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-02-10 22:01:35 -08:00
Edgar Lee
34d5878185 Use mount.Target to specify subdirectory of rootfs mount
- Add Target to mount.Mount.
- Add UnmountMounts to unmount a list of mounts in reverse order.
- Add UnmountRecursive to unmount deepest mount first for a given target, using
moby/sys/mountinfo.

Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
2023-01-27 09:51:58 +08:00
Derek McGowan
b550526ccd
Use cleanup.Background instead of context.Background for cleanup
Use the cleanup context to re-use values from the original context

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-01-04 11:22:24 -08:00
Maksym Pavlenko
1d8b1bc75b Cleanup shim manager
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-08-11 21:41:32 -07:00
Maksym Pavlenko
ff65fc2d0e Make TaskList generic
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-08-10 14:02:53 -07:00
Maksym Pavlenko
e2fd25f3d8 Move runtime v2 proto
Move runtime v2 protos to api/runtime package.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-19 17:59:33 -07:00
Kazuyoshi Kato
88c0c7201e Consolidate gogo/protobuf dependencies under our own protobuf package
This would make gogo/protobuf migration easier.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-19 15:53:36 +00:00
Kazuyoshi Kato
80b825ca2c Remove gogoproto.stdtime
This commit removes gogoproto.stdtime, since it is not supported by
Google's official toolchain
(see https://github.com/containerd/containerd/issues/6564).

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-19 13:39:30 +00:00
Maksym Pavlenko
b7a36950f6 [Sandbox] Add Wait and PID
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-08 13:33:48 -07:00
Maksym Pavlenko
0d165e6544 Restore sandboxes on daemon restart
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-08 13:33:48 -07:00
Maksym Pavlenko
6343fe3ea2 [sandbox] Implement sandbox controller
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-08 13:33:47 -07:00
Kazuyoshi Kato
96b16b447d Use typeurl.Any instead of github.com/gogo/protobuf/types.Any
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-03-24 20:50:07 +00:00
Kazuyoshi Kato
067611fdea Remove enumvalue_customname, goproto_enum_prefix and enum_customname
This commit removes gogoproto.enumvalue_customname,
gogoproto.goproto_enum_prefix and gogoproto.enum_customname.

All of them make proto-generated Go code more idiomatic, but we already
don't use these enums in our external-surfacing types and they are anyway
not supported by Google's official toolchain (see #6564).

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-03-21 19:48:16 +00:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Maksym Pavlenko
8b788d9dfe Expose shim process interface
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-11-01 07:37:01 -07:00
Maksym Pavlenko
fb5f6ce3c9 Rework task create and cleanup flow
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-11-01 07:37:00 -07:00
Maksym Pavlenko
2d5d3541e6 Rename task manager to shim manager
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-11-01 07:36:34 -07:00
Eng Zer Jun
50da673592
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-09-21 09:50:38 +08:00
Maksym Pavlenko
d30d897ef9 Cleanup v2 shim
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-08-04 10:38:05 -07:00
yylt
0d45ac14e9 interface about shim build check
Signed-off-by: Yang Yang <yang8518296@163.com>
2021-07-22 09:03:12 +08:00
Wei Fu
9fdc96c095 runtime/v2: add comment for checkCopyShimLogError
After #4906, containerd opens fifo in read/write mode in linux platform
The original comment doesn't correct and is removed by #5174.

```
// original comment

// When using a multi-container shim, the fifo of the 2nd to Nth
// container will not be opened when the ctx is done. This will
// cause an ErrReadClosed that can be ignored.
```

However, we should add comment for checkCopyShimLogError to mention why
we call checkCopyShimLogError. The checkCopyShimLogError, it is to prevent
the flood of expected error messages after task die and the expected
errors depend on platform.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-03-18 13:02:28 +08:00
Wei Fu
eabd9b98b6 runtime: ignore file-already-closed error if dead shim
fix: #5130

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-03-15 12:18:26 +08:00
payall4u
957fa3379d change flag from RDONLY to RDWR and close the fifo correct
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-01-31 19:00:42 +08:00
Wei Fu
faec5d4ffd runtime: should not send duplicate task exit event
If the shim has been killed and ttrpc connection has been
closed, the shimErr will not be nil. For this case, the event
subscriber, like moby/moby, might have received the exit or delete
events. Just in case, we should allow ttrpc-callback-on-close to
send the exit and delete events again. And the exit status will
depend on result of shimV2.Delete.

If not, the shim has been delivered the exit and delete events.
So we should remove the task record and prevent duplicate events from
ttrpc-callback-on-close.

Fix: #4769

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-12-01 21:54:04 +08:00
Maksym Pavlenko
0d4734655f
Merge pull request #4647 from katiewasnothere/task_update_annotations_upstream
Add annotations to task update request api
2020-11-18 14:44:19 -08:00
Kathryn Baldauf
95ba6e9f75 Add annotations to task update request api
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
2020-11-09 14:13:33 -08:00
Giuseppe Capizzi
8eda32e107 Check if a process exists before returning it
Fixes #4632.

Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Danail Branekov <danailster@gmail.com>
2020-10-22 16:50:14 +03:00
Wei Fu
4b05d03903 runtime/v2: cleanup dead shim before delete bundle
The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.

The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.

And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-09-20 11:24:31 +08:00
Sebastiaan van Stijn
dc92ad6520
Replace errors.Cause() with errors.Is()
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-08 14:36:45 +02:00
Wei Fu
0116352e1b runtime: ignore ttrpc.ErrClosed when delete task
For some reason, shimv2 process doesn't exist. The ttrpc doesn't detect
the connection closed by server until delete task. For this case, we
should ignore the ttrpc.ErrClosed and let task manager handle the
cleanup.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2020-04-20 23:34:49 +08:00