In linux platform, the shim server always listens on the socket before
the containerd task manager dial it. It is unlikely that containerd task
manager should handle reconnect because the shim can't restart. For this
case, the containerd task manager should fail fast if there is ENOENT or
ECONNREFUSED error.
And if the socket file is deleted during cleanup the exited task, it
maybe cause that containerd task manager takes long time to reload the
dead shim. For that task.v2 manager, the race case is like:
```
TaskService.Delete
TaskManager.Delete(runtime/v2/manager.go)
shim.delete(runtime/v2/shim.go)
shimv2api.Shutdown(runtime/v2/task/shim.pb.go)
<- containerd has been killed or restarted somehow
bundle.Delete
```
The shimv2api.Shutdown will cause that the shim deletes socket file
(containerd-shim-runc-v2 does). But the bundle is still there. During
reloading, the containerd will wait for the socket file appears again
in 100 seconds. It is not reasonable. The Reconnect should prevent this
case by fast fail.
Closes: #5648.
Fixes: #5597.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Since the /run directory on macOS is read-only, darwin containerd should
use a different directory. Use the pre-defined default values instead
to avoid this issue.
Fixes: bd908acab ("Use path based unix socket for shims")
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
For the abstract socket adress there's no need to chmod
the address's file, cause the file didn't exist actually.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The shim.SetScore() utility was no longer used since 7dfc605fc6.
Checking for uses outside of this repository, I found only one external use of
this in gVisor; a9441aea27/pkg/shim/service.go (L262-L264)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
oom_score_adj must be in the range -1000 to 1000. In AdjustOOMScore if containerd's score is already at the maximum value we should set that value for the shim instead of trying to set 1001 which is invalid.
Signed-off-by: Simon Kaegi <simon_kaegi@ca.ibm.com>
This allows filesystem based ACLs for configuring access to the socket of a
shim.
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Michael Crosby <michael.crosby@apple.com>
Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own
Closes#3471.
Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>
This changes the shim's OOM score from a static max killable of -999 to
be +1 of the containerd daemon's score. This should allow the shim's to
be killed first in an OOM condition but leave the daemon alone for a bit
to help cleanup and manage the containers during this situation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Use sha256 hash to shorten the unix socket path to satisfy the
length limitation of abstract socket path
This commit also backports the feature storing address path to
a file from v2 to keep compatibility
Fixes#3032
Signed-off-by: Eric Lin <linxiulei@gmail.com>
Use full name including extension for shim binary format on Windows in order to
match any stat path faster without a fallback.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>