Previously shim v2 (`io.containerd.runc.{v1,v2}`) always used `/run/containerd/runc` as the runc root.
Fix#4326
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
TestRuntimeWithEmptyMaxEnvProcs should restore the GoMaxProcs after
test so that the temporary change of GoMaxProcs will not impact other
case, like TestRuntimeWithNonEmptyMaxEnvProcs.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
For some reason, shimv2 process doesn't exist. The ttrpc doesn't detect
the connection closed by server until delete task. For this case, we
should ignore the ttrpc.ErrClosed and let task manager handle the
cleanup.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Request came from a slack message that shims do not output their versions making
it hard for users and operators to know what version of a shim they have on the
system. This adds a `-v` flag to the shims so that users can see if a shim is
in sync with containerd or what versions of shims that they are running.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The Err() method should be called after the Scan() loop, not inside it.
Found by: git grep -A3 -F '.Scan()'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own
Closes#3471.
Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>
The background context aovids shim blocking when the ctx is cancelled
unexpectedly during shim start. But if the shim exits unexpectedly
before opening the pipe, the fd will never be closed.
`onCloseWithShimLog` makes sure that the shim log fd is closed properly
once the shim disconnects.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
There's no OOM monitoring for the v2 cgroups yet, so it seems unlikely
that there was a leak in that case.
Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
Only start watching the cgroup for OOMs when the first process starts
instead of on every process.
Signed-off-by: Seth Pellegrino <spellegrino@newrelic.com>
If the context is cancelled during `shim.Create()`, such as the client
disconnects unexpectedly. The created shim will never be deleted.
What's more, if the context is cancelled during `openShimLog()`, the
fifo will be closed and block the shim output.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
Previously, the platform was closed as part of the Delete method when the
process was an init for a task and there were no more tasks after its deletion.
This can create problems if another task is created within the shim right after
the delete runs, which results in the platform being closed but the shim
continuing to run.
This change moves closing the platform to the Shutdown method after the shim's
context is canceled, which ensures the platform is only closed once the shim
is sure its done servicing containers.
Signed-off-by: Erik Sipsma <sipsma@amazon.com>
* only shim v2 runc v2 ("io.containerd.runc.v2") is supported
* only PID metrics is implemented. Others should be implemented in separate PRs.
* lots of code duplication in v1 metrics and v2 metrics. Dedupe should be separate PR.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Reized the I/O buffers to align with the size of the kernel buffers with fifos
and move the close aspect of the console to key off of the stdin closing.
Fixes#3738
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
- Our out of tree shim would like to publish events with ttrpc. These
functions should be exposed so our shim doesn't need to reimplement
publisher logic.
Signed-off-by: Kathryn Baldauf <kabaldau@microsoft.com>
Because of the way go handles flags, passing a flag that is not defined
will cause an error. In our case, if we kept this as a flag, then
third-party shims would break when they see this new flag. To fix this,
I moved this new configuration option to an env var. We should use env
vars from here on out to avoid breaking shim compat.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Previously the TTRPC address was generated as "<GRPC address>.ttrpc".
This change now allows explicit configuration of the TTRPC address, with
the default still being the old format if no value is specified.
As part of this change, a new configuration section is added for TTRPC
listener options.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
When containerd-shim does reaper, the most processes are not init
process. Since json.Decode consumes more CPU resource, we should check
killall option for init process only.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
When using a multi-container shim, the fifo of the 2nd to Nth container
will not be opened when the ctx is done. This will cause an
`ErrReadClosed` that can be ignored.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
This adds a singleton `timeout` package that will allow services and user
to configure timeouts in the daemon. When a service wants to use a
timeout, it should declare a const and register it's default value
inside an `init()` function for that package. When the default config
is generated, we can use the `timeout` package to provide the available
timeout keys so that a user knows that they can configure.
These show up in the config as follows:
```toml
[timeouts]
"io.containerd.timeout.shim.cleanup" = 5
"io.containerd.timeout.shim.load" = 5
"io.containerd.timeout.shim.shutdown" = 3
"io.containerd.timeout.task.state" = 2
```
Timeouts in the config are specified in seconds.
Timeouts are very hard to get right and giving this power to the user to
configure things is a huge improvement. Machines can be faster and
slower and depending on the CPU or load of the machine, a timeout may
need to be adjusted.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
AnonDialer will now return a "not found" error if the pipe is not found
before the timeout is reached. If the pipe exists but the timeout is
reached while attempting to connect, the timeout error will still be
returned.
This will allow the error handling logic to work properly when
connecting to the shim log pipe. An error message is only logged if the
error is not "not found", so now log noise from log pipes that were
never intended to be created by the shim will be hidden.
This change also cleans up the control flow for AnonDialer on Windows.
The new code should be more easily readable, but the only semantic
change is the error return value change.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
When shimv2 dead, the container would be cleanup, but
the corresponding runtime task still existed in runtime
task lists, it should be deleted too.
Signed-off-by: lifupan <lifupan@gmail.com>
This changes the shim's OOM score from a static max killable of -999 to
be +1 of the containerd daemon's score. This should allow the shim's to
be killed first in an OOM condition but leave the daemon alone for a bit
to help cleanup and manage the containers during this situation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This ensures that a container does not have a mounted rootfs in the
bundle directory before RemoveAll is called. Having the rootfs removed
first with a Remove ensures that the directory is not mounted and empty
before the bundle directory is removed.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
skip hidden directories in load task, and return soon if path not exist
in atomicDelete
carry of #3233Closes#3233
Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Open shim v2 log with the flag `O_RDWR` will cause the `Read()` block
forever even if the pipe has been closed on the shim side. Then the
`io.Copy()` would never return and lead to a fd leak.
Fix typo when closing shim v1 log which causes the `stdouLog` leak.
Update `numPipes` function in test case to get the opened FIFO
correctly.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
Before this change, the v2 runtime shim setup code was hardcoded to always
configure logrus to write logs to the "log" FIFO present in the current working
directory. This only happens in the "default" action codepath
(i.e. not shim start or shim delete).
This is problematic for shims that execute outside the current working
directory of a bundle. For example, it often doesn't make sense for shims that
manage multiple containers to execute in a single bundle directory. Additionally,
shim processes that require being pre-created, i.e. spun up before tasks they
will handle are actually created, won't have a log FIFO to write to until a task
is created.
This change leaves the default behavior as is but introduces a Binary Config
field that will optionally disable automatic configuration of logrus to use the
"log" FIFO. This allows shims to configure their own logger if necessary while
still re-using the rest of the shim helper code in containerd.
Signed-off-by: Erik Sipsma <sipsma@amazon.com>
Shims no longer call `os.Exit` but close the context on shutdown so that
events and other resources have hit the `defer`s.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* Rootfs dir is created during container creation not during bundle
creation
* Add support for v2
* UnmountAll is a no-op when the path to unmount (i.e. the rootfs dir)
does not exist or is invalid
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
Use sha256 hash to shorten the unix socket path to satisfy the
length limitation of abstract socket path
This commit also backports the feature storing address path to
a file from v2 to keep compatibility
Fixes#3032
Signed-off-by: Eric Lin <linxiulei@gmail.com>
Closes#603
This adds logging facilities at the shim level to provide minimal I/O
overhead and pluggable logging options. Log handling is done within the
shim so that all I/O, cpu, and memory can be charged to the container.
A sample logging driver setting up logging for a container the systemd
journal looks like this:
```go
package main
import (
"bufio"
"context"
"fmt"
"io"
"sync"
"github.com/containerd/containerd/runtime/v2/logging"
"github.com/coreos/go-systemd/journal"
)
func main() {
logging.Run(log)
}
func log(ctx context.Context, config *logging.Config, ready func() error) error {
// construct any log metadata for the container
vars := map[string]string{
"SYSLOG_IDENTIFIER": fmt.Sprintf("%s:%s", config.Namespace, config.ID),
}
var wg sync.WaitGroup
wg.Add(2)
// forward both stdout and stderr to the journal
go copy(&wg, config.Stdout, journal.PriInfo, vars)
go copy(&wg, config.Stderr, journal.PriErr, vars)
// signal that we are ready and setup for the container to be started
if err := ready(); err != nil {
return err
}
wg.Wait()
return nil
}
func copy(wg *sync.WaitGroup, r io.Reader, pri journal.Priority, vars map[string]string) {
defer wg.Done()
s := bufio.NewScanner(r)
for s.Scan() {
if s.Err() != nil {
return
}
journal.Send(s.Text(), pri, vars)
}
}
```
A `logging` package has been created to assist log developers create
logging plugins for containerd.
This uses a URI based approach for logging drivers that can be expanded
in the future.
Supported URI scheme's are:
* binary
* fifo
* file
You can pass the log url via ctr on the command line:
```bash
> ctr run --rm --runtime io.containerd.runc.v2 --log-uri binary://shim-journald docker.io/library/redis:alpine redis
```
```bash
> journalctl -f -t default:redis
-- Logs begin at Tue 2018-12-11 16:29:51 EST. --
Mar 08 16:08:22 deathstar default:redis[120760]: 1:C 08 Mar 2019 21:08:22.703 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.704 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 * Running mode=standalone, port=6379.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # Server initialized
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
Mar 08 16:08:22 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:22.705 * Ready to accept connections
Mar 08 16:08:50 deathstar default:redis[120760]: 1:signal-handler (1552079330) Received SIGINT scheduling shutdown...
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.405 # User requested shutdown...
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.406 * Saving the final RDB snapshot before exiting.
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.452 * DB saved on disk
Mar 08 16:08:50 deathstar default:redis[120760]: 1:M 08 Mar 2019 21:08:50.453 # Redis is now ready to exit, bye bye...
```
The following client side Opts are added:
```go
// LogURI provides the raw logging URI
func LogURI(uri *url.URL) Creator { }
// BinaryIO forwards contianer STDOUT|STDERR directly to a logging binary
func BinaryIO(binary string, args map[string]string) Creator {}
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
With the change to unified shims (ie: 1 shim per multiple tasks) the shimLog
on Windows for the 2nd-Nth worload containers will not have an associated
named pipe listener.
Due to a subtle bug in errors.Wrap passing a nil error we would unblock the
disconnected listener and return 0 byte successfull reads which would cause go
to continually read and cap the CPU.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Due to an error in the OCI specf for layerFolders, the runhcs shim was
passing the layers for LCOW in reverse order. This fixes the ordering
by simply removing the code which reversed the layers for LCOW.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Use full name including extension for shim binary format on Windows in order to
match any stat path faster without a fallback.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
megacheck, gosimple and unused has been deprecated and subsumed by
staticcheck. And staticcheck also has been upgraded. we need to update
code for the linter issue.
close: #2945
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Changes the requirement of a Runtime v2 shim in order to avoid race conditions
between shim and shim client sending async events. Places a requirement of what
events and what order a shim must comply to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Before this change, the shim was only publishing a non-zero exit status
(exit code) in the case that the process.Wait() call failed. This
grabs the exit status correctly when process.Wait() succeeds too.
The call was closing all upstream IO when a shim.CloseIO call was made rather
than just the Stdin as it is supposed to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Rootfs length can be set to zero if the upstream caller fully manages storage
and mounts on their own. In this case just treat the bundle as a fully complete
OCI spec and run it without doing any storage work in the shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
When an exec occurs the pid was not properly updated on the in memory state
value causing many queries to see a 0.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
The two new method Add/Delete can allow custom plugin to add or migrate
existing task into major Runtime plugin.
close: #2888
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This makes bundle removal atomic by first renaming the bundle and
working directories to a hidden path before removing the underlying
directories.
Closes#2567Closes#2327
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This brings freebsd in line with Darwin, ie it builds, but some parts may not yet
be fully functional. There is now a WIP `runc` port for FreeBSD at
https://github.com/clovertrail/runc/tree/1501-SupportOnFreeBSD so should be able
to test further.
Signed-off-by: Justin Cormack <justin@specialbusservice.com>
support checkpoint without committing a checkpoint dir into a
checkpoint image and restore without untar image into checkpoint
directory. support for both v1 and v2 runtime
Signed-off-by: Ace-Tang <aceapril@126.com>
add ImagePath and WorkPath for checkpoint process, add CriuImagePath
and CriuWorkPath for create process in runtime v2 protobuf
Signed-off-by: Ace-Tang <aceapril@126.com>
logrus v1.0.3 was the first release that include the change in
terminal_windows.go that stops exec'ing "cmd ver" to obtain the version
information and rather uses the x/sys/crypto/terminal.IsTerminal on the
console fd. On Windows this is a significant performance difference to
avoid the additional process activation of the "cmd ver" for each
invocation of the shim/runhcs executables.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
add '-id' flag when start container with io.containerd.runc.v1 shim, or user
can not get container-shim relation from 'ps -ef',like
```
/usr/bin/containerd-shim-runc-v1 -namespace default -address
/run/containerd/containerd.sock -publish-binary /usr/bin/containerd
```
Signed-off-by: Ace-Tang <aceapril@126.com>
1. avoid dead lock during kill, fetch allProcesses before handle events
2. use argu's ctx instead of context.Backgroud() in openlog
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Fixes#2709
This increases the buffer size for process exit subscribers. It also
implements a non-blocking send on the subscriber channel. It is better
to drop an exit even than it is to block a shim for one slow subscriber.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
On Windows because of the way the log pipe is forwarded to the shim there is a
condition where the pipe listener may not yet be active when a client tries to
connect. To handle this case we allow polling on the file and rety on pipe not
found. This limits the pipe not found retry to 5 seconds but leaves the connect
timeout alone as if there is a listener we want to connect to it normally.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>