containerd

Author	SHA1	Message	Date
Laura Brehm	421a4b568c	runc-shim: handle pending execs as running This commit rewrites and simplifies a lot of this logic to reduce it's complexity, and also handle the case where the container doesn't have it's own pid-namespace, which means that we're not guaranteed to receive the init exit last. This is achieved by replacing `s.pendingExecs` with `s.runningExecs`, for which both (previously) pending and de facto running execs are considered. The new exit handling logic can be summed up by: - when we receive an init exit, stash it it in `s.containerInitExit`, and if a container's init process has exited, refuse new execs. - (if the container does not have it's own pidns) kill all running processes (if the container has a private pid-namespace, then all processes will be dead already). - wait for the container's running exec count (which includes execs which have been started but might still early exit) to get to 0. - publish the stashed away init exit. Signed-off-by: Laura Brehm <laurabrehm@hey.com>	2024-09-04 11:47:12 +01:00
Cory Snider	e7357916bb	runc-shim: refuse to start execs after init exits The runc task state machine prevents execs from being created after the init process has exited, but there are no guards against starting a created exec after the init process has exited. That leaves a small window for starting an exec to race our handling of the init process exiting. Normally this is not an issue in practice: the kernel will atomically kill all processes in a PID namespace when its "init" process terminates, and will not allow new processes to fork(2) into the PID namespace afterwards. Therefore the racing exec is guaranteed by the kernel to not be running after the init process terminates. On the other hand, when the container does not have a private PID namespace (i.e. the container's init process is not the "init" process of the container's PID namespace), the kernel does not automatically kill other container processes on init exit and will happily allow runc to start an exec process at any time. It is the runc shim's responsibility to clean up the container when the init process exits in this situation by killing all the container's remaining processes. Block execs from being started after the container's init process has exited to prevent the processes from leaking, and to avoid violating the task service's assumption that an exec can be running iff the init process is also running. Signed-off-by: Cory Snider <csnider@mirantis.com>	2024-09-02 10:43:53 +01:00
Laura Brehm	7f3bf993d6	runc-shim: remove misleading comment It's not true that `s.mu` needs to be held when calling `handleProcessExit`, and indeed hasn't been the case for a while – see `892dc54bd2`. Signed-off-by: Laura Brehm <laurabrehm@hey.com>	2024-08-29 13:04:22 +01:00
Sebastiaan van Stijn	9776047243	migrate to github.com/moby/sys/userns Commit `8437c567d8` migrated the use of the userns package to the github.com/moby/sys/user module. After further discussion with maintainers, it was decided to move the userns package to a separate module, as it has no direct relation with "user" operations (other than having "user" in its name). This patch migrates our code to use the new module. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-08-08 12:48:54 +02:00
Sebastiaan van Stijn	8437c567d8	pkg/userns: deprecate and migrate to github.com/moby/sys/user/userns The userns package in libcontainer was integrated into the moby/sys/user module at commit [3778ae603c706494fd1e2c2faf83b406e38d687d][1]. This patch deprecates the containerd fork of that package, and adds it as an alias for the moby/sys/user/userns package. [1]: `3778ae603c` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-07-26 09:47:50 +02:00
Akhil Mohan	300fd770a0	use typeurl funcs for marshalling anypb.Any Signed-off-by: Akhil Mohan <akhilerm@gmail.com>	2024-07-10 22:26:27 +05:30
Derek McGowan	2ac2b9c909	Make api a Go sub-module Allow the api to stay at the same v1 go package name and keep using a 1.x version number. This indicates the API is still at 1.x and allows sharing proto types with containerd 1.6 and 1.7 releases. Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-05-02 11:03:00 -07:00
Derek McGowan	e1b94c0e7d	Move protobuf package under pkg Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-05-02 10:52:03 -07:00
Derek McGowan	4a45507772	Move runc options to api directory Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-05-02 10:52:00 -07:00
Laura Brehm	6d00c3ada8	runc-shim: only defer init process exits In order to make sure that we don't publish task exit events for init processes before we do for execs in that container, we added logic to `processExits` in `892dc54bd2` to skip these and let the pending exec's `handleStarted` closure process them. However, the conditional logic in `processExits` added was faulty - we should only defer processing of exit events related to init processes, not other execs. Due to this missing condition, `892dc54bd2` introduced a bug where, if there are many concurrent execs for the same container/init pid, exec exits are skipped and then never published, resulting in hanging clients. This commit adds the missing logic to `processExits`. Signed-off-by: Laura Brehm <laurabrehm@hey.com>	2024-03-26 13:39:11 +00:00
Maksym Pavlenko	6a96e45012	Move shim package to pkg Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2024-03-07 10:05:26 -08:00
Laura Brehm	892dc54bd2	runc-shim: process exec exits before init For a given container, as long as the init process is the init process of that PID namespace, we always receive the exits for execs before we receive them for the init process. It's important that we uphold this invariant for the outside world by always emitting a TastExit event for a container's exec before we emit one for the init process because this is the expected behavior from callers, and changing this creates issues - such as Docker, which will delete the container after receiving a TaskExit for the init process, and then not be able to handle the exec's exit after having deleted the container (see: https://github.com/containerd/containerd/issues/9719). Since `5cd6210ad0`, if an exec is starting at the same time that an init exits, if the exec is an "early exit" i.e. we haven't emitted a TaskStart for it/put it in `s.running` by the time we receive it's exit, we notify concurrent calls to `s.Start()` of the exit and continue processing exits, which will cause us to process the Init's exit before the exec, and emit it, which we don't want to do. This commit introduces a map `s.pendingExecs` to keep track of the number of pending execs keyed by container, which allows us to skip processing exits for inits if there are pending execs, and instead have the closure returned by `s.preStart` handle the init exit after emitting the exec's exit. Signed-off-by: Laura Brehm <laurabrehm@hey.com>	2024-03-01 16:43:19 +00:00
Derek McGowan	fb9b59a843	Switch to new errdefs package Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-25 22:18:45 -08:00
Derek McGowan	dbc74db6a1	Move runtime to core/runtime Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:58:04 -08:00
Derek McGowan	6be90158cd	Move sys to pkg/sys Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:56:16 -08:00
Derek McGowan	fa8cae99d1	Move namespaces to pkg/namespaces Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:55:39 -08:00
Derek McGowan	44a836c9b5	Move errdefs to pkg/errdefs Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:54:45 -08:00
Akihiro Suda	8e567aa581	mv pkg/process cmd/containerd-shim-runc-v2/process The package is quite specific to runc and only imported by containerd-shim-runc-v2 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2023-11-30 21:50:04 +09:00
Maksym Pavlenko	7d65a45639	Move runc shim implementation to cmd Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2023-11-14 10:13:32 -08:00

19 Commits