Commit Graph

1533 Commits

Author SHA1 Message Date
Samuel Karp
c8b095f3c2
Merge pull request #10651 from laurazard/shim-refactor-without-pending
runc-shim: fix races/prevent init exits from being dropped
2024-09-05 19:10:42 +00:00
Laura Brehm
421a4b568c
runc-shim: handle pending execs as running
This commit rewrites and simplifies a lot of this logic to reduce it's
complexity, and also handle the case where the container doesn't have
it's own pid-namespace, which means that we're not guaranteed to receive
the init exit last.

This is achieved by replacing `s.pendingExecs` with `s.runningExecs`,
for which both (previously) pending and de facto running execs are
considered.

The new exit handling logic can be summed up by:
- when we receive an init exit, stash it it in `s.containerInitExit`,
  and if a container's init process has exited, refuse new execs.
- (if the container does not have it's own pidns) kill all running
  processes (if the container has a private pid-namespace, then all
  processes will be dead already).
- wait for the container's running exec count (which includes execs
  which have been started but might still early exit) to get to 0.
- publish the stashed away init exit.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2024-09-04 11:47:12 +01:00
Cory Snider
e7357916bb
runc-shim: refuse to start execs after init exits
The runc task state machine prevents execs from being created after the
init process has exited, but there are no guards against starting a
created exec after the init process has exited. That leaves a small
window for starting an exec to race our handling of the init process
exiting. Normally this is not an issue in practice: the kernel will
atomically kill all processes in a PID namespace when its "init" process
terminates, and will not allow new processes to fork(2) into the PID
namespace afterwards. Therefore the racing exec is guaranteed by the
kernel to not be running after the init process terminates. On the other
hand, when the container does not have a private PID namespace (i.e. the
container's init process is not the "init" process of the container's
PID namespace), the kernel does not automatically kill other container
processes on init exit and will happily allow runc to start an exec
process at any time. It is the runc shim's responsibility to clean up
the container when the init process exits in this situation by killing
all the container's remaining processes. Block execs from being started
after the container's init process has exited to prevent the processes
from leaking, and to avoid violating the task service's assumption that
an exec can be running iff the init process is also running.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-09-02 10:43:53 +01:00
Laura Brehm
7f3bf993d6
runc-shim: remove misleading comment
It's not true that `s.mu` needs to be held when calling
`handleProcessExit`, and indeed hasn't been the case for a
while – see 892dc54bd2.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2024-08-29 13:04:22 +01:00
Kir Kolyshkin
94c163209d TestNewBinaryIOCleanup: fix a comment, minor rewrite
The main reason is to improve the comment about pidfd in Go 1.23+.

While at it:
 - avoid slice manipulation as we only need count;
 - avoid repeating "/proc/self/fd".

Updates: #10345.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-08-27 23:15:05 -07:00
Phil Estes
569a8d6b60
Merge pull request #10626 from dmcgowan/content-local-plugin
Register local content plugin from separate package
2024-08-26 19:11:57 +00:00
Jin Dong
35b0292572 remove sha256-simd
Signed-off-by: Jin Dong <djdongjin95@gmail.com>
2024-08-25 04:46:04 +00:00
Derek McGowan
50b06182f8
Register local content plugin from separate package
Update the local content plugin to register itself in a consistent way
as other plugins. This also allows the separate package to define its
own configuration more cleanly.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-08-22 11:18:30 -07:00
Fu Wei
dd2a24cf0e
Merge pull request #10557 from tariq1890/cli-ctx-add
use ctx object from cliContext instead of a creating a new one
2024-08-13 01:13:48 +00:00
Fu Wei
7403f91f1a
Merge pull request #10560 from samuelkarp/ctr-shim-state
ctr: shim state for secondary tasks & shim state query for old shims
2024-08-13 01:13:30 +00:00
Derek McGowan
268ae7fa02
Merge pull request #10562 from zhsj/pidfd
Fix TestNewBinaryIOCleanup on Go 1.23 and Linux 5.4
2024-08-09 13:13:58 +00:00
Sebastiaan van Stijn
9776047243
migrate to github.com/moby/sys/userns
Commit 8437c567d8 migrated the use of the
userns package to the github.com/moby/sys/user module.

After further discussion with maintainers, it was decided to move the
userns package to a separate module, as it has no direct relation with
"user" operations (other than having "user" in its name).

This patch migrates our code to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-08-08 12:48:54 +02:00
Shengjing Zhu
8ef73c5dd5 Fix TestNewBinaryIOCleanup on Go 1.23 and Linux 5.4
When running the test on Ubuntu focal (kernel version 5.4), the
symlink for pidfd is anon_inode:[pidfd].

Updates: #10345

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2024-08-08 17:20:19 +08:00
Samuel Karp
7d4da0cb28
ctr: shim state query for old shims
Old shims do not implement containerd.task.v3.Task, but it can be
useful to use a new ctr with an older shim especially during upgrade
scenarios.

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2024-08-07 16:48:14 -07:00
Samuel Karp
d59e8a8404
ctr: shim state for secondary tasks
The v2 shim interface supports grouping, so a single shim can manage
multiple tasks.  Prior to this change, the `shim state` command could
only query the state of the primary task (task that shares the same ID
as the shim).

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2024-08-07 16:48:08 -07:00
Tariq Ibrahim
32c2d14932
use ctx object from cliContext instead of a creating a new one
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-08-06 13:42:22 -07:00
Erikson Tung
551ac0600a Ensure /run/containerd is created with correct perms
There are a couple directories that get created under the default
state directory ("/run/containerd") even when containerd is configured
to use a different location for its state directory. Create the default
state directory even if containerd is configured to use a different
state directory location. This ensure pkg/shim and pkg/fifo won't create
the default state directory with incorrect permissions when calling
os.MkdirAll for their respective subdirectories.

Signed-off-by: Erikson Tung <etung@netflix.com>
2024-07-30 17:55:01 -07:00
Sebastiaan van Stijn
8437c567d8
pkg/userns: deprecate and migrate to github.com/moby/sys/user/userns
The userns package in libcontainer was integrated into the moby/sys/user
module at commit [3778ae603c706494fd1e2c2faf83b406e38d687d][1].

This patch deprecates the containerd fork of that package, and adds it as
an alias for the moby/sys/user/userns package.

[1]: 3778ae603c

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-07-26 09:47:50 +02:00
Phil Estes
0fe79b6eac
Merge pull request #10346 from mauri870/hotfix/gotip-test
Fix TestNewBinaryIOCleanup failing with gotip
2024-07-23 02:31:47 +00:00
Samuel Karp
1e3c35bd0d
Merge pull request #10488 from dcantah/avoid-realloc
Avoid potential reallocs by pre-sizing some slices
2024-07-22 05:39:19 +00:00
Mauri de Souza Meneguzzo
f0aecaa2e2
Fix TestNewBinaryIOCleanup failing with gotip
This PR ignores a new pidfd file descriptor that is introduced in
gotip (future 1.23) and should not be considered when detecting fd leaks.

Fixes #10345

Signed-off-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
2024-07-19 18:49:40 -03:00
Danny Canter
b41bb6df73 Avoid potential reallocs by pre-sizing some slices
There's a couple spots where we know exactly how large
the destination buffer should be, so pre-size these to
avoid any reallocs to a higher capacity.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2024-07-19 13:05:49 -07:00
Maksym Pavlenko
63b4688175 Use grpc.NewClient instead of deprecated ones
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2024-07-18 15:26:02 -07:00
Akhil Mohan
300fd770a0
use typeurl funcs for marshalling anypb.Any
Signed-off-by: Akhil Mohan <akhilerm@gmail.com>
2024-07-10 22:26:27 +05:30
Henry Wang
243b803a19 Add pprof to runc-shim
Signed-off-by: Henry Wang <henwang@amazon.com>
2024-06-20 23:12:31 +00:00
Sebastiaan van Stijn
dd0542f7c1
cmd: don't alias context package, and use cliContext for cli.Context
Unfortunately, this is a rather large diff, but perhaps worth a one-time
"rip off the bandaid" for v2. This patch removes the use of "gocontext"
as alias for stdLib's "context", and uses "cliContext" for uses of
cli.context.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-20 02:15:13 +02:00
Kern Walster
5b8dfbd111 Allow proxy plugins to have capabilities
Signed-off-by: Kern Walster <walster@amazon.com>
2024-06-13 17:13:57 +00:00
Akihiro Suda
86b8a88241
Remove pkg/seed
Since Go 1.20, math/rand does not need explicit seeding:
https://go.dev/doc/go1.20#minor_library_changes

Go <= 1.19 is no longer supported due to EOL.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2024-06-13 08:50:28 +09:00
Kohei Tokunaga
df7f6ba5b9
ctr: return explicit errors for flags unsupported by transfer service
ctr currently silently ignores several flags by default (without --local) and
the user can't know which flags are supported until they see the code.
This commit fixes ctr to return an explicit error when it finds an unsupported
flag.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2024-06-11 12:08:47 +09:00
Maksym Pavlenko
34d3c17ae2
Merge pull request #10291 from ktock/push-platform-conf
Transfer: Push: Enable to specify platforms
2024-06-05 21:28:09 +00:00
Kohei Tokunaga
cde2527fce
ctr: pull: Do not ignore labels when transfer service is used
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2024-06-05 12:26:00 +09:00
Kohei Tokunaga
5611fdd4af
Transfer: Push: Enable to specify platforms
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2024-06-04 10:02:13 +09:00
Derek McGowan
2788604e49
Update ctr image pull all platforms
Allows supporting fetching of all platforms while unpacking for a subset
of platforms.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-05-09 20:48:38 -07:00
Maksym Pavlenko
b6ffa2794e
Merge pull request #10164 from henry118/shim-fix
Update ctr shim subcommand to task v3
2024-05-03 19:37:36 +00:00
Akihiro Suda
ef12da25e2
Merge pull request #9781 from kinvolk/rata/userns-use-pluginInfo
core/runtime: Check shim PluginInfo to enforce idmap support
2024-05-03 16:07:50 +00:00
Henry Wang
b8060d641d Update ctr shim subcommand to task v3
Signed-off-by: Henry Wang <henwang@amazon.com>
2024-05-03 15:25:48 +00:00
Rodrigo Campos
f1e265b138 core/runtime: Check shim PluginInfo to enforce idmap support
This commit gets rid of the TODO by moving the check to use the
pluginInfo() infrastructure.

The check is only enforced for shims that return info that can be read
as type runtime.Features. For shims that don't provide that, we just
ignore it, as those shims might not be affected by this.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2024-05-03 15:00:59 +02:00
Derek McGowan
2ac2b9c909
Make api a Go sub-module
Allow the api to stay at the same v1 go package name and keep using a
1.x version number. This indicates the API is still at 1.x and allows
sharing proto types with containerd 1.6 and 1.7 releases.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-05-02 11:03:00 -07:00
Derek McGowan
e1b94c0e7d
Move protobuf package under pkg
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-05-02 10:52:03 -07:00
Derek McGowan
3e9cace720
Move runtimeoptions to api directory
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-05-02 10:52:02 -07:00
Derek McGowan
4a45507772
Move runc options to api directory
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-05-02 10:52:00 -07:00
Maksym Pavlenko
444679c883
Merge pull request #10109 from dmcgowan/fix-fallback-explicit-tls
Update HTTP fallback to better account for TLS timeout and previous attempts
2024-04-23 04:10:39 +00:00
ChengenH
4a31bd606d chore: use errors.New to replace fmt.Errorf with no parameters will much better
Signed-off-by: ChengenH <hce19970702@gmail.com>
2024-04-21 21:49:31 +08:00
Evan Lezar
1b62224181 Bump tags.cncf.io/container-device-interface to v0.7.1
This includes migrating from cdi.GetRegistry() to cdi.Configure() and
using top-level cdi Refresh and InjectDevices functions as applicable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 15:25:11 +02:00
Derek McGowan
7c50784591
Remove empty default tls configuration in ctr
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-04-09 15:40:09 -07:00
Phil Estes
ac8f7698cf
Merge pull request #9999 from laurazard/fix-exec-concurrent-shim
runc-shim: only defer init process exits
2024-04-05 09:27:35 -04:00
Kohei Tokunaga
4332794384
Transfer: Registry: Enable plain HTTP
Currenlty transfer service doesn't handle plain HTTP connection.
This commit fixes this issue by propagating
`(core/remotes/docker/config).HostOptions.DefaultScheme` from client to the
transfer service.
This commit also fixes ctr to use this feature for "--plain-http" flag.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2024-04-03 10:46:10 +09:00
Derek McGowan
3a8c27dff8
Merge pull request #9908 from ktock/transfer-host-dir
Transfer: Registry: Enable to use registry configuration diretory
2024-04-02 18:59:43 +00:00
baijia
ab2c569fb2 ctr: fix parsing mount options
Set 'DisableSliceFlagSeparator = true'

urfave/cli/v2 uses ',' as default string slice separator.
That means '--mount type=bind,src=/src,des=/des,options=rbind:rw'
will be token as four bind mount options.

Fixes: #10003

Signed-off-by: baijia <baijia.wr@antgroup.com>
2024-03-27 17:50:39 +08:00
Laura Brehm
6d00c3ada8
runc-shim: only defer init process exits
In order to make sure that we don't publish task exit events for init
processes before we do for execs in that container, we added logic to
`processExits` in 892dc54bd2 to skip these
and let the pending exec's `handleStarted` closure process them.

However, the conditional logic in `processExits` added was faulty - we
should only defer processing of exit events related to init processes,
not other execs. Due to this missing condition,
892dc54bd2 introduced a bug where, if
there are many concurrent execs for the same container/init pid, exec
exits are skipped and then never published, resulting in hanging
clients.

This commit adds the missing logic to `processExits`.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
2024-03-26 13:39:11 +00:00