Due to an error in the OCI specf for layerFolders, the runhcs shim was
passing the layers for LCOW in reverse order. This fixes the ordering
by simply removing the code which reversed the layers for LCOW.
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Use full name including extension for shim binary format on Windows in order to
match any stat path faster without a fallback.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
megacheck, gosimple and unused has been deprecated and subsumed by
staticcheck. And staticcheck also has been upgraded. we need to update
code for the linter issue.
close: #2945
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Changes the requirement of a Runtime v2 shim in order to avoid race conditions
between shim and shim client sending async events. Places a requirement of what
events and what order a shim must comply to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Before this change, the shim was only publishing a non-zero exit status
(exit code) in the case that the process.Wait() call failed. This
grabs the exit status correctly when process.Wait() succeeds too.
The call was closing all upstream IO when a shim.CloseIO call was made rather
than just the Stdin as it is supposed to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Rootfs length can be set to zero if the upstream caller fully manages storage
and mounts on their own. In this case just treat the bundle as a fully complete
OCI spec and run it without doing any storage work in the shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
When an exec occurs the pid was not properly updated on the in memory state
value causing many queries to see a 0.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
The two new method Add/Delete can allow custom plugin to add or migrate
existing task into major Runtime plugin.
close: #2888
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This makes bundle removal atomic by first renaming the bundle and
working directories to a hidden path before removing the underlying
directories.
Closes#2567Closes#2327
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This brings freebsd in line with Darwin, ie it builds, but some parts may not yet
be fully functional. There is now a WIP `runc` port for FreeBSD at
https://github.com/clovertrail/runc/tree/1501-SupportOnFreeBSD so should be able
to test further.
Signed-off-by: Justin Cormack <justin@specialbusservice.com>
There is still a special case where the client side fails to open or
load causes things to be slow and the shim can lock up when this
happens. This adds a timeout to the context for this case to abort fifo
creation.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
support checkpoint without committing a checkpoint dir into a
checkpoint image and restore without untar image into checkpoint
directory. support for both v1 and v2 runtime
Signed-off-by: Ace-Tang <aceapril@126.com>
add ImagePath and WorkPath for checkpoint process, add CriuImagePath
and CriuWorkPath for create process in runtime v2 protobuf
Signed-off-by: Ace-Tang <aceapril@126.com>
add ImagePath and WorkPath for checkpoint process, add CriuImagePath
and CriuWorkPath for create process in runtime v1 protobuf
Signed-off-by: Ace-Tang <aceapril@126.com>
logrus v1.0.3 was the first release that include the change in
terminal_windows.go that stops exec'ing "cmd ver" to obtain the version
information and rather uses the x/sys/crypto/terminal.IsTerminal on the
console fd. On Windows this is a significant performance difference to
avoid the additional process activation of the "cmd ver" for each
invocation of the shim/runhcs executables.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
There were races with the way process states. This displayed in ways,
especially around pausing the container for atomic operations. Users
would get errors like, cannnot delete container in paused state and
such.
This can be eaisly reproduced with `docker` and the following command:
```bash
> (for i in `seq 1 25`; do id=$(docker create alpine usleep 50000);docker start $id;docker commit $id;docker wait $id;docker rm $id; done)
```
This two issues that this fixes are:
* locks must be held by the owning process, not the state operations.
* If a container ends up being paused but before the operation
completes, the process exists, make sure we resume the container before
setting the the process as exited.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
add '-id' flag when start container with io.containerd.runc.v1 shim, or user
can not get container-shim relation from 'ps -ef',like
```
/usr/bin/containerd-shim-runc-v1 -namespace default -address
/run/containerd/containerd.sock -publish-binary /usr/bin/containerd
```
Signed-off-by: Ace-Tang <aceapril@126.com>
1. avoid dead lock during kill, fetch allProcesses before handle events
2. use argu's ctx instead of context.Backgroud() in openlog
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Fixes#2709
This increases the buffer size for process exit subscribers. It also
implements a non-blocking send on the subscriber channel. It is better
to drop an exit even than it is to block a shim for one slow subscriber.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
On Windows because of the way the log pipe is forwarded to the shim there is a
condition where the pipe listener may not yet be active when a client tries to
connect. To handle this case we allow polling on the file and rety on pipe not
found. This limits the pipe not found retry to 5 seconds but leaves the connect
timeout alone as if there is a listener we want to connect to it normally.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Revendors to Microsoft/hcsshim v0.7.5 that added support for logging all
runhcs.exe commands via Windows named pipes. This now launches all runhcs.exe
commands and forwards debug logging to the containerd-shim-runhcs log when
with --debug.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
1. Fixes bugs in ctr run that were introduced by 1d9b969
2. Adds support for the --isolated flag that runs Windows HyperV
cotainers instead of process isolated containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
func (e *execProcess) delete(ctx context.Context) error {
e.wg.Wait()
...
}
delete exec process will wait for io copy finish, if wait here,
other process can not get lock of shim service.
1. apply lock around s.transition() calls in the Delete methods.
2. put lock after wait io copy in exec Delete.
Signed-off-by: Ace-Tang <aceapril@126.com>
This makes sure that runc does not get any valid IO for the pipe. Some
builds and other containers will be stuck if they inspect stdin
expecially and its a pipe but not connected to any user input.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When creating a default OCI spec on Windows that is targeting the LCOW
platform it needs to contain a Windows section as well. This adds the
Windows section by default. It also protects against this case for all
OCI creation that doesnt use the OCI package in the runhcs-shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
- Still KillAll if the task uses the hosts pid namespace
- Test for both host pid namespace and normal cases
Co-authored-by: Oliver Stenbom <ostenbom@pivotal.io>
Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This is the case where the work dir could still exist if a machine
reboots, reseting the state dir. On container creation, we should just
clear out the work dir.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Sometimes the wrong ID was being used because its not correct to assume
that ExecID is always set. The assumption was that for API's where it is
not an exec ID == ExecID but thats not true. ExecID == "" if it is not
an exec. This uses the correct ID in all cases.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This was found testing other runtime shims that are faster than runc(no
containerization). This is a race that can cause the shim to block
forever. It's not an issue for out/err because we open both sides of
the pipe, but for stdin, it expects the client to have it opened.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Switches the client/server direction of the shim-log pipe on Windows so
that the shim is the listener. This allows the containerd client to
reconnect as needed to the log streams.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
A fifo on unix or named pipe on Windows will be provided to the shim.
It can be located inside the `cwd` of the shim named "log".
The shims can use the existing `github.com/containerd/containerd/log` package to log debug messages.
Messages will automatically be output in the containerd's daemon logs with the correct fiels and runtime set.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Adds retry support to AnonDialer if the pipe does not exist. This will
retry up to the timeout for the pipe to exist and connect. This solves
the race between the containerd-shim-* start command and the
reinvocation.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Fixes an issue where the runtime v2 was not using an absolute path to
the executable but setting the .Dir field on the exec.Cmd. This causes
the executable to need to be relative to .Dir but no shim is actually
copied to the bundle directory that its work dir is set to.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This cleans up persistent work dirs on TaskManager boot. These dirs can
be left behind in a machine reboot. The state in /run will not exist
but the work dir in the root does, we should cleanup work dirs when
tasks are not loaded.
This also improves error handling that would prevent the task manager
from loading when a single task fails to load or cleanup.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
1. Moves the log message for each socket to the appropriate _unix and
_windows.go
2. Replaces all reference to Abstract Socket for Windows.
3. Adds support for ctrl+c on Windows to exit a shim.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Since windows does not require a signal handler, we just block on the
channel forever so that it does not exit.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implements the various requirements for the runtime v2 code to abstract
away the unix/linux code into the appropriate platform level
abstractions to use the runtime v2 on Windows as well.
Adds support in the Makefile.windows to actually build the runtime v2
code for Windows by setting a shell environment BUILD_WINDOWS_V2=1
before calling make. (Note this disables the compilation of the Windows
runtime v1)
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This sets the shim's max procs to 2, like we already have hard coded in
the shim, with the env var so that it is set at go runtime boot.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This renames the runtime interface to PlatformRuntime to denote the
layer at which the runtime is being abstracted. This should be used to
abstract different platforms that vary greatly and do not have full
compat with OCI based binary runtimes.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This reverts commit 06dc87ae59.
Revert "Change oom metric to const"
This reverts commit e800f08f9f.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the metric vec that was holding onto all task id and
namespace combinations forever, until containerd was restarted. This
was causing a memory leak with many task.
This also removes the shim cmd where the `Args` is quite large from the
reaper after the shim has been started cutting down on another leak.
This is the first pass through the reaper but more code is required to
fix all the issues when commands are added.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This changes Wait() from returning an error whenever you call wait on a
stopped process/task to returning the exit status from the process.
This also adds the exit status to the Status() call on a process/task so
that a user can Wait(), check status, then cancel the wait to avoid
races in event handling.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Since we now have a common set of error definitions, mapped to existing
error codes, we no longer need the specialized error codes used for
interaction with linux processes. The main issue was that string
matching was being used to map these to useful error codes. With this
change, we use errors defined in the `errdefs` package, which map
cleanly to GRPC error codes and are recoverable on either side of the
request.
The main focus of this PR was in removin these from the shim. We may
need follow ups to ensure error codes are preserved by the `Tasks`
service.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
These rpcs only return pids []uint32 so should be named that way in
order to have other rpcs that list Processes such as Exec'd processes.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* Sync process.State() with the matching events
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Allow requesting events for a specific container
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Sync container state retrieval with other events
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Let containerd take care of calling runtime delete on exit
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Take care of possible race in TestBusyboxTopExecTopKillInit
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
This fixes a sync issue when the containerd api returns after a
container has started. It fixes it by calling the runtime start inside
containerd after the oom handler has been setup.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* Micro benchmarks: use container.Runtime to kill container
Signed-off-by: Julio Montes <julio.montes@intel.com>
* Micro benchmarks: add support for multiples runtimes
Signed-off-by: Julio Montes <julio.montes@intel.com>
* Vendor in runc afaa21f79ade3b2e99a68f3f15e7219155aa4662
This updates the Dockerfile to use go 1.6.2 and install pkg-config are
both are now needed by runc.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Add support for runc create/start operation
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Remove dependency on runc state directory for OOM handler
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
* Add OOM test
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
This is the first in a series of micro benchmarks for containerd.
Performance measurement will use containerd objects and methods
that are not dependent on the grpc API and dont require the daemon
to the running. Test will require containerd-shim and runc.
The motivation is to understand the baseline performance at the lowest
containerd layer. A natural extension to this effort would be to write
macro benchmarks which would include API and daemon.
Note:
- Currently measures only one workload (busybox sh) start times. Will
add other bundles and args soon.
- Can use integration-test utils for bundle processing. However, json
marshal/unmarshal is currently timing out standard benchmark times. So
going with default spec for now.
Sample run:
BenchmarkBusyboxSh-4 / # / # / # 2 576013841 ns/op
ok github.com/docker/containerd/runtime 1.800s
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
* containerd build clean on Solaris
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>
* Vendor golang.org/x/sys
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>
See https://github.com/docker/docker/issues/22643 for an example
where we get an error running a cmd but there's no output so `b`
is an empty string, which means the user doesn't see any interesting
error message to help them.
This PR will send back the `err` and `b` so that between those two
bits of info they should get something more than a blank string.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Because we are launching alot of different runc commands to do
operations there is a race between doing a `cmd.Wait()` and getting the
sigchld and reaping it. We can remove the sigchild reaper from
containerd as long as we make sure we reap the shim process if we are
the parent, i.e. not restored.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
runc `events --stats` now has stable output so we don't need to bind to
libcontainer directly to get stats output for the containers.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
runc now has a `ps` command with json output to support listing all the
processes inside a container. We no longer need to use libcontainer
directly for doing this.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Windows will not use containerd and its just unused code and unneed
complexity to keep it all around.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Windows is not going to use containerd because there is already a
similar implementation on windows. This removes all the windows files
because there is no reason to keep this overhead when its not going to
be used.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Prior to this patch, when list containers by "ctr containers" or
"ctr containers xxx", it will not get the proper status of conatinser(s).
That was caused by the wrong implementation of State() for structure process,
it only send a signal "0" to ping the "init" process and do nothing.
Since the OCI/runc has implemented an interface Status(), we can use that.
And I think this is more compatible with the design for containerd:
- containerd -> runtime -> fun()
Signed-off-by: Hu Keping <hukeping@huawei.com>
This currently depends on a runc PR:
https://github.com/opencontainers/runc/pull/703
We need this pr because we have to SIGKILL runc and the container root
dir will still be left around.
As for the containerd changes this adds a flag to containerd so that you
can configure the timeout without any more code changes. It also adds
better handling in the error cases and will kill the containerd-shim and
runc ( as well as the user process if it exists ) if the timeout is hit.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows you to pass options like:
```bash
containerd --debug --runtime-args "--debug" --runtime-args
"--systemd-cgroup"
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
If containerd binary is renamed to docker-containerd, then it should
invoke the docker-containerd-shim binary.
Signed-off-by: Tibor Vass <tibor@docker.com>
If we fail to exec a process make sure that it is cleaned up within the
container's information and on disk state.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
If the shim gets sigkilled while containerd is down we need to be able
to remove the container correctly so that it does not stay in a stopped
state forever.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Move process sorter to new file
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Sort containers by id
This will not be the most accurate sorting but atleast the list will be
consistent inbetween calls.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow runtime to be configurable via daemon start
This allows people to pass an alternate name or location to the runtime
binary to start containers.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fix state output for containers
Return the proper state/status for a container by checking if the pid is
still alive. Also fix the cleanup handling in the shim to make sure
containers are not left behind.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Properly wait for container start
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove runtime files from containerd
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update supervisor for orphaned containers
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove ctr/container.go back to rpc calls
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add attach to loaded container
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add monitor based on epoll for process exits
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Convert pids in containerd to string
This is so that we no longer care about linux or system level pids and
processes in containerd have user defined process id(pid) kinda like the
exec process ids that docker has today.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add reaper back to containerd
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implement list containers with new process model
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implement restore of processes
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add NONBLOCK to exit fifo open
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implement tty reattach
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fix race in exit pipe creation
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add delete to shim
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update shim to use pid-file and not stdout
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>