This is mostly to workaround an issue with gRPC based shims after containerd
restart. If a shim dies while containerd is also down/restarting, on reboot
grpc.DialContext with our current set of DialOptions will make us wait for
100 seconds per shim even if the socket no longer exists or has no listener.
Signed-off-by: Danny Canter <danny@dcantah.dev>
These logs were already using structured logs, so include "id" as a field,
which also prevents the id being quoted (and escaped when printing);
time="2023-11-15T11:30:23.745574884Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2023-11-15T11:30:23.745612425Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-11-15T11:30:23.745620884Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2023-11-15T11:30:23.745625925Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
Also updated some changed `WithError().WithField()` calls, to prevent some
overhead.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is effectively a revert of 2ac9968401, which
switched from os/exec to the golang.org/x/sys/execabs package to mitigate
security issues (mainly on Windows) with lookups resolving to binaries in the
current directory.
from the go1.19 release notes https://go.dev/doc/go1.19#os-exec-path
> ## PATH lookups
>
> Command and LookPath no longer allow results from a PATH search to be found
> relative to the current directory. This removes a common source of security
> problems but may also break existing programs that depend on using, say,
> exec.Command("prog") to run a binary named prog (or, on Windows, prog.exe) in
> the current directory. See the os/exec package documentation for information
> about how best to update such programs.
>
> On Windows, Command and LookPath now respect the NoDefaultCurrentDirectoryInExePath
> environment variable, making it possible to disable the default implicit search
> of “.” in PATH lookups on Windows systems.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The ShimManager.Start() will call loadShim() to get the existing shim if SandboxID
is specified for a container, but shimTask.PID() is called in loadShim,
which will call Connect() of Task API with the ID of a task that is not
created yet(containerd is getting the shim and Task API address to call
Create, so the task is not created yet).
In this commit we change the logic of loadShim() to get the shim without calling
Connect() of the not created container ID.
Signed-off-by: Abel Feng <fshb1988@gmail.com>
Remove containerd specific parts of the plugin package to prepare its
move out of the main repository. Separate the plugin registration
singleton into a separate package.
Separating out the plugin package and registration makes it easier to
implement external plugins without creating a dependency loop.
Signed-off-by: Derek McGowan <derek@mcg.dev>
The plugins packages defines the plugins used by containerd.
Move all the types and properties to this package.
Signed-off-by: Derek McGowan <derek@mcg.dev>
After pr #8617, create handler of containerd-shim-runc-v2 will
call handleStarted() to record the init process and handle its exit.
Init process wouldn't quit so early in normal circumstances. But if
this screnario occurs, handleStarted() will call
handleProcessExit(), which will cause deadlock because create() had
acquired s.mu, and handleProcessExit() will try to lock it again.
So, I added a parameter muLocked to handleStarted to indicate whether
or not s.mu is currently locked, and thus deciding whether or not to
lock it when calling handleProcessExit.
Fix: #9103
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Previous code has already called `getContainer()`, just pass it into
`s.getContainerPids` to reduce unnecessary lock and map lookup.
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Since the moby/moby can't handle duplicate exit event well, it's hard
for containerd to retry shutdown if there is error, like context
canceled.
In order to prevent from regression like #4769, I add skipped
integration case as TODO item and we should rethink about how to handle
the task/shim lifecycle.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
In commit 4b35c3829d, example shim erroneously started to depend on runc, fix that back.
Also, build example shim on all supported platforms to prevent such situations in the future.
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
Every shim implementation needs to select a correct publisher topic when posting events, so move it out of Linux-only runc code to the place where other shims can also use it
Otherwise, shims have to copy-paste this code. For example, see runj: 8158e558a3/containerd/shim.go (L144-L172)
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
To gather metrics/stats about a specific sandbox instance, it'd be nice to
have a dedicated rpc for this. Due to the same "what kind of stats are going
to be returned" dilemma exists for sandboxes as well, I've re-used the metrics
type we have as the data field is just an `any`, leaving the metrics returned
entirely up to the shim author. For CRI usecases this will just be cgroup and
windows stats as that's all that's supported right now.
Signed-off-by: Danny Canter <danny@dcantah.dev>
eventSendMu is causing severe lock contention when multiple processes
start and exit concurrently. Replace it with a different scheme for
maintaining causality w.r.t. start and exit events for a process which
does not rely on big locks for synchronization.
Keep track of all processes for which a Task(Exec)Start event has been
published and have not yet exited in a map, keyed by their PID.
Processing exits then is as simple as looking up which process
corresponds to the PID. If there are no started processes known with
that PID, the PID must either belong to a process which was started by
s.Start() and before the s.Start() call has added the process to the map
of running processes, or a reparented process which we don't care about.
Handle the former case by having each s.Start() call subscribe to exit
events before starting the process. It checks if the PID has exited in
the time between it starting the process and publishing the TaskStart
event, handling the exit if it has. Exit events for reparented processes
received when no s.Start() calls are in flight are immediately
discarded, and events received during an s.Start() call are discarded
when the s.Start() call returns.
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>