1. kill shim in cleanupAfterDeadShim avoid shim leak
2. refactor cleanupAfterDeadShim, get pid from bundle
path instead of make pid as a parameter
Signed-off-by: Ace-Tang <aceapril@126.com>
Shims no longer call `os.Exit` but close the context on shutdown so that
events and other resources have hit the `defer`s.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* Rootfs dir is created during container creation not during bundle
creation
* Add support for v2
* UnmountAll is a no-op when the path to unmount (i.e. the rootfs dir)
does not exist or is invalid
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
Use sha256 hash to shorten the unix socket path to satisfy the
length limitation of abstract socket path
This commit also backports the feature storing address path to
a file from v2 to keep compatibility
Fixes#3032
Signed-off-by: Eric Lin <linxiulei@gmail.com>
Closes#603
This adds logging facilities at the shim level to provide minimal I/O
overhead and pluggable logging options. Log handling is done within the
shim so that all I/O, cpu, and memory can be charged to the container.
A sample logging driver setting up logging for a container the systemd
journal looks like this:
```go
package main
import (
"bufio"
"context"
"fmt"
"io"
"sync"
"github.com/containerd/containerd/runtime/v2/logging"
"github.com/coreos/go-systemd/journal"
)
func main() {
logging.Run(log)
}
func log(ctx context.Context, config *logging.Config, ready func() error) error {
// construct any log metadata for the container
vars := map[string]string{
"SYSLOG_IDENTIFIER": fmt.Sprintf("%s:%s", config.Namespace, config.ID),
}
var wg sync.WaitGroup
wg.Add(2)
// forward both stdout and stderr to the journal
go copy(&wg, config.Stdout, journal.PriInfo, vars)
go copy(&wg, config.Stderr, journal.PriErr, vars)
// signal that we are ready and setup for the container to be started
if err := ready(); err != nil {
return err
}
wg.Wait()
return nil
}
func copy(wg *sync.WaitGroup, r io.Reader, pri journal.Priority, vars map[string]string) {
defer wg.Done()
s := bufio.NewScanner(r)
for s.Scan() {
if s.Err() != nil {
return
}
journal.Send(s.Text(), pri, vars)
}
}
```
A `logging` package has been created to assist log developers create
logging plugins for containerd.
This uses a URI based approach for logging drivers that can be expanded
in the future.
Supported URI scheme's are:
* binary
* fifo
* file
You can pass the log url via ctr on the command line:
```bash
> ctr run --rm --runtime io.containerd.runc.v2 --log-uri binary://shim-journald docker.io/library/redis:alpine redis
```
```bash
> journalctl -f -t default:redis
-- Logs begin at Tue 2018-12-11 16:29:51 EST. --
Mar 08 16:08:22 deathstar default:redis[120760]: 1:C 08 Mar 2019 21:08:22.703 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 * Running mode=standalone, port=6379.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # Server initialized
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 * Ready to accept connections
Mar 08 16:08:50 deathstar default:redis[120760]: 1:signal-handler (1552079330) Received SIGINT scheduling shutdown...
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.405 # User requested shutdown...
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.406 * Saving the final RDB snapshot before exiting.
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.452 * DB saved on disk
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.453 # Redis is now ready to exit, bye bye...
```
The following client side Opts are added:
```go
// LogURI provides the raw logging URI
func LogURI(uri *url.URL) Creator { }
// BinaryIO forwards contianer STDOUT|STDERR directly to a logging binary
func BinaryIO(binary string, args map[string]string) Creator {}
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
With the change to unified shims (ie: 1 shim per multiple tasks) the shimLog
on Windows for the 2nd-Nth worload containers will not have an associated
named pipe listener.
Due to a subtle bug in errors.Wrap passing a nil error we would unblock the
disconnected listener and return 0 byte successfull reads which would cause go
to continually read and cap the CPU.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Due to an error in the OCI specf for layerFolders, the runhcs shim was
passing the layers for LCOW in reverse order. This fixes the ordering
by simply removing the code which reversed the layers for LCOW.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Use full name including extension for shim binary format on Windows in order to
match any stat path faster without a fallback.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
megacheck, gosimple and unused has been deprecated and subsumed by
staticcheck. And staticcheck also has been upgraded. we need to update
code for the linter issue.
close: #2945
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Changes the requirement of a Runtime v2 shim in order to avoid race conditions
between shim and shim client sending async events. Places a requirement of what
events and what order a shim must comply to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Before this change, the shim was only publishing a non-zero exit status
(exit code) in the case that the process.Wait() call failed. This
grabs the exit status correctly when process.Wait() succeeds too.
The call was closing all upstream IO when a shim.CloseIO call was made rather
than just the Stdin as it is supposed to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Rootfs length can be set to zero if the upstream caller fully manages storage
and mounts on their own. In this case just treat the bundle as a fully complete
OCI spec and run it without doing any storage work in the shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
When an exec occurs the pid was not properly updated on the in memory state
value causing many queries to see a 0.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
The two new method Add/Delete can allow custom plugin to add or migrate
existing task into major Runtime plugin.
close: #2888
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This makes bundle removal atomic by first renaming the bundle and
working directories to a hidden path before removing the underlying
directories.
Closes#2567Closes#2327
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This brings freebsd in line with Darwin, ie it builds, but some parts may not yet
be fully functional. There is now a WIP `runc` port for FreeBSD at
https://github.com/clovertrail/runc/tree/1501-SupportOnFreeBSD so should be able
to test further.
Signed-off-by: Justin Cormack <justin@specialbusservice.com>
There is still a special case where the client side fails to open or
load causes things to be slow and the shim can lock up when this
happens. This adds a timeout to the context for this case to abort fifo
creation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
support checkpoint without committing a checkpoint dir into a
checkpoint image and restore without untar image into checkpoint
directory. support for both v1 and v2 runtime
Signed-off-by: Ace-Tang <aceapril@126.com>
add ImagePath and WorkPath for checkpoint process, add CriuImagePath
and CriuWorkPath for create process in runtime v2 protobuf
Signed-off-by: Ace-Tang <aceapril@126.com>
add ImagePath and WorkPath for checkpoint process, add CriuImagePath
and CriuWorkPath for create process in runtime v1 protobuf
Signed-off-by: Ace-Tang <aceapril@126.com>
logrus v1.0.3 was the first release that include the change in
terminal_windows.go that stops exec'ing "cmd ver" to obtain the version
information and rather uses the x/sys/crypto/terminal.IsTerminal on the
console fd. On Windows this is a significant performance difference to
avoid the additional process activation of the "cmd ver" for each
invocation of the shim/runhcs executables.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
There were races with the way process states. This displayed in ways,
especially around pausing the container for atomic operations. Users
would get errors like, cannnot delete container in paused state and
such.
This can be eaisly reproduced with `docker` and the following command:
```bash
> (for i in `seq 1 25`; do id=$(docker create alpine usleep 50000);docker start $id;docker commit $id;docker wait $id;docker rm $id; done)
```
This two issues that this fixes are:
* locks must be held by the owning process, not the state operations.
* If a container ends up being paused but before the operation
completes, the process exists, make sure we resume the container before
setting the the process as exited.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
add '-id' flag when start container with io.containerd.runc.v1 shim, or user
can not get container-shim relation from 'ps -ef',like
```
/usr/bin/containerd-shim-runc-v1 -namespace default -address
/run/containerd/containerd.sock -publish-binary /usr/bin/containerd
```
Signed-off-by: Ace-Tang <aceapril@126.com>
1. avoid dead lock during kill, fetch allProcesses before handle events
2. use argu's ctx instead of context.Backgroud() in openlog
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Fixes#2709
This increases the buffer size for process exit subscribers. It also
implements a non-blocking send on the subscriber channel. It is better
to drop an exit even than it is to block a shim for one slow subscriber.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
On Windows because of the way the log pipe is forwarded to the shim there is a
condition where the pipe listener may not yet be active when a client tries to
connect. To handle this case we allow polling on the file and rety on pipe not
found. This limits the pipe not found retry to 5 seconds but leaves the connect
timeout alone as if there is a listener we want to connect to it normally.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Revendors to Microsoft/hcsshim v0.7.5 that added support for logging all
runhcs.exe commands via Windows named pipes. This now launches all runhcs.exe
commands and forwards debug logging to the containerd-shim-runhcs log when
with --debug.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
1. Fixes bugs in ctr run that were introduced by 1d9b969
2. Adds support for the --isolated flag that runs Windows HyperV
cotainers instead of process isolated containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
func (e *execProcess) delete(ctx context.Context) error {
e.wg.Wait()
...
}
delete exec process will wait for io copy finish, if wait here,
other process can not get lock of shim service.
1. apply lock around s.transition() calls in the Delete methods.
2. put lock after wait io copy in exec Delete.
Signed-off-by: Ace-Tang <aceapril@126.com>
This makes sure that runc does not get any valid IO for the pipe. Some
builds and other containers will be stuck if they inspect stdin
expecially and its a pipe but not connected to any user input.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When creating a default OCI spec on Windows that is targeting the LCOW
platform it needs to contain a Windows section as well. This adds the
Windows section by default. It also protects against this case for all
OCI creation that doesnt use the OCI package in the runhcs-shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
- Still KillAll if the task uses the hosts pid namespace
- Test for both host pid namespace and normal cases
Co-authored-by: Oliver Stenbom <ostenbom@pivotal.io>
Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This is the case where the work dir could still exist if a machine
reboots, reseting the state dir. On container creation, we should just
clear out the work dir.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Sometimes the wrong ID was being used because its not correct to assume
that ExecID is always set. The assumption was that for API's where it is
not an exec ID == ExecID but thats not true. ExecID == "" if it is not
an exec. This uses the correct ID in all cases.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This was found testing other runtime shims that are faster than runc(no
containerization). This is a race that can cause the shim to block
forever. It's not an issue for out/err because we open both sides of
the pipe, but for stdin, it expects the client to have it opened.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Switches the client/server direction of the shim-log pipe on Windows so
that the shim is the listener. This allows the containerd client to
reconnect as needed to the log streams.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
A fifo on unix or named pipe on Windows will be provided to the shim.
It can be located inside the `cwd` of the shim named "log".
The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages.
Messages will automatically be output in the containerd's daemon logs with the correct fiels and runtime set.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Adds retry support to AnonDialer if the pipe does not exist. This will
retry up to the timeout for the pipe to exist and connect. This solves
the race between the containerd-shim-* start command and the
reinvocation.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Fixes an issue where the runtime v2 was not using an absolute path to
the executable but setting the .Dir field on the exec.Cmd. This causes
the executable to need to be relative to .Dir but no shim is actually
copied to the bundle directory that its work dir is set to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This cleans up persistent work dirs on TaskManager boot. These dirs can
be left behind in a machine reboot. The state in /run will not exist
but the work dir in the root does, we should cleanup work dirs when
tasks are not loaded.
This also improves error handling that would prevent the task manager
from loading when a single task fails to load or cleanup.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
1. Moves the log message for each socket to the appropriate _unix and
_windows.go
2. Replaces all reference to Abstract Socket for Windows.
3. Adds support for ctrl+c on Windows to exit a shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Since windows does not require a signal handler, we just block on the
channel forever so that it does not exit.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implements the various requirements for the runtime v2 code to abstract
away the unix/linux code into the appropriate platform level
abstractions to use the runtime v2 on Windows as well.
Adds support in the Makefile.windows to actually build the runtime v2
code for Windows by setting a shell environment BUILD_WINDOWS_V2=1
before calling make. (Note this disables the compilation of the Windows
runtime v1)
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This sets the shim's max procs to 2, like we already have hard coded in
the shim, with the env var so that it is set at go runtime boot.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>