Commit Graph

218 Commits

Author SHA1 Message Date
Lantao Liu
28ca8f05d3 Fix task load.
Signed-off-by: Lantao Liu <lantaol@google.com>
2017-10-05 21:03:24 +00:00
Kenfe-Mickael Laventure
26d4c2c217
Add an option to prevent putting the shim in a new mount namespace
This is needed for users on kernel older than 3.18 so they can avoid EBUSY
errors when trying to unlink, rename or remove a mountpoint that is present in
a shim namespace.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-10-04 09:16:02 -07:00
Ace-Tang
2231de3760 Remove a blank line
Signed-off-by: Ace-Tang <aceapril@126.com>
2017-10-03 20:58:51 +08:00
Michael Crosby
451421b615 Comment more packages to pass go lint
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-02 13:54:56 -04:00
Michael Crosby
d67763d922 Add wait API endpoint for waiting on process exit
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-21 15:03:58 -04:00
Michael Crosby
7030a4add0 Close epoller on task stop
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-20 11:29:13 -04:00
Michael Crosby
d22160c28e Vendor typeurl package
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-19 09:43:55 -04:00
Kenfe-Mickael Laventure
f2d1459929
Remove omitempty from toml tags
The encoder only support changing the name of the key.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-09-14 20:49:22 -07:00
Michael Crosby
951c129bf1 Handle locking and errors for process state
ref: #1464

This tries to solve issues with races around process state.  First it
adds the process mutex around the state call so that any state changes,
deletions, etc will be handled in order.

Second, for IsNoExist errors from the runtime, return a stopped state if
a process has been removed from the underlying OCI runtime but not from
the shim yet.  This shouldn't happen with the lock from above but its
hare to verify this issue.

Third, handle shim disconnections and return an ErrNotFound.

Forth, don't abort returning all tasks if one task is unable to return
its state.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-07 16:22:00 -04:00
Michael Crosby
4c5ed9c068 Move metrics requests to services
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-05 17:41:30 -04:00
Michael Crosby
ed45952826 Use cgroups proto for prom metrics
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-05 17:26:26 -04:00
Michael Crosby
697dcdd407 Refactor task service metrics
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-05 17:26:26 -04:00
Michael Crosby
f5d81a631e Return grpc errs from task service
Closes #1201

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-05 16:10:19 -04:00
Kenfe-Mickaël Laventure
b4cc42d028 Merge pull request #1460 from mlaventure/pid-host-kill-init
Ensure all init children are dead when it exits
2017-09-01 15:17:40 -07:00
Kenfe-Mickael Laventure
92772bd471
linux: Ensure all init children are dead when it exits
This ensure that when using the host pid, we don't let process alive,
preventing Wait() to return until they all die.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-09-01 14:50:56 -07:00
Kenfe-Mickael Laventure
9d251cbd1b
Delete bundle dir on restore if we're not debugging the shim
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-09-01 14:50:56 -07:00
Michael Crosby
b04e408a4b Convert OOM Metric to Const
This converts the oom metric to be a const metric so that deleted tasks
do not fill up the metric labels.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-01 16:43:30 -04:00
Kenfe-Mickael Laventure
1b79170849
linux: Add RuntimeRoot to RuncOptions
This allow specifying wher the OCI runtime should store its state data.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-31 14:35:05 -07:00
Kenfe-Mickael Laventure
ab0cb4e756
linux: Honor RuncOptions if set on container
This also fix the type used for RuncOptions.SystemCgroup, hence introducing
an API break.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-31 14:35:05 -07:00
Michael Crosby
6b4c4a2937 Update reaper for multipe subscribers
Depends on https://github.com/containerd/go-runc/pull/24

The is currently a race with the reaper where you could miss some exit
events from processes.

The problem before and why the reaper was so complex was because
processes could fork, getting a pid, and then fail on an execve before
we would have time to register the process with the reaper.  This could
cause pids to fill up in a map as a way to reduce the race.

This changes makes the reaper handle multiple subscribers so that the
caller can handle locking, for when they want to wait for a specific
pid, without affecting other callers using the reaper code.

Exit events are broadcast to multiple subscribers, in the case, the runc
commands and container pids that we get from a pid-file.  Locking while
the entire container stats no longs affects runc commands where you want
to call `runc create` and wait until that has been completed.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-31 14:29:47 -04:00
Michael Crosby
c3711c3866 Merge pull request #1319 from mlaventure/handle-sigkilled-shim
Handle sigkilled shim
2017-08-29 14:06:17 -04:00
Kenfe-Mickael Laventure
1c92c0ecbf
Fix panic in CloseIO when not Stdin was allocated for a process
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 09:58:48 -07:00
Kenfe-Mickael Laventure
edd1da8591
Use configured runtime when cleaning up after dead shim
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 08:27:44 -07:00
Kenfe-Mickael Laventure
3f34c421d3
Add missing "/tasks/exec-started" event topic
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 08:27:44 -07:00
Kenfe-Mickael Laventure
d541567119
Handle SIGKILL'ed shim while daemon is running
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 08:27:44 -07:00
Kenfe-Mickael Laventure
c23f29ebce
containerd-shim: Don't try to delete container twice
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 08:27:44 -07:00
Kenfe-Mickael Laventure
eb4abac9f7
linux: Prevent deadlock in reaper.WaitPid()
A deadlock can occurs if `WaitPid()` is called twice before the process
dies.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-29 08:27:44 -07:00
Michael Crosby
967497097a Add procesStates for shim processes
Use the state pattern to handle process transitions from one state to
another and what actions can be performed on a process in a specific
state.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-25 14:03:55 -04:00
Kenfe-Mickael Laventure
8a1b03e525
Add ExitedAt to process proto definition
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-21 08:18:02 -07:00
Michael Crosby
4950c26757 Revert "Wait for client side copy goroutines to start"
This reverts commit 06dc87ae59.

Revert "Change oom metric to const"

This reverts commit e800f08f9f.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-18 16:20:02 -04:00
Michael Crosby
e800f08f9f Change oom metric to const
This removes the metric vec that was holding onto all task id and
namespace combinations forever, until containerd was restarted.  This
was causing a memory leak with many task.

This also removes the shim cmd where the `Args` is quite large from the
reaper after the shim has been started cutting down on another leak.

This is the first pass through the reaper but more code is required to
fix all the issues when commands are added.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-17 16:23:20 -04:00
Michael Crosby
eb58ecab7c Add null io option
This adds null IO option for efficient handling of IO.
It provides a container directly with `/dev/null` and does not require
any io.Copy within the shim whenever a user does not want the IO of the
container.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-14 13:09:16 -04:00
Michael Crosby
fa3454e54d Update go-runc to b85ac701de5065a66918203dd18f05
This includes fixes for pipe ownership and NullIO options.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-11 17:15:25 -04:00
Kenfe-Mickael Laventure
e661be6a9c
Fix failure to connect to shim on daemon restart
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-11 09:34:29 -07:00
Michael Crosby
d513dd2bfd Fix race with task checkpoint
Because runc will delete a container after a successful checkpoint we
need to handle a NotFound error from runc on delete.

There is also a race between SIGKILL'ing the shim and it actually
exiting to unmount the tasks rootfs, we need to loop and wait for the
task to actually be reaped before trying to delete the rootfs+bundle.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-10 16:35:03 -04:00
Phil Estes
c8cd3d9080
Fix bundle removal with the reappearance of state dir
Bundle removal now requires removing both workdir and path locations on
delete in shim.

Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
2017-08-07 17:50:22 -04:00
Phil Estes
e9b86af848 Merge pull request #1283 from mlaventure/resurrect-state-dir
Resurrect State directory
2017-08-07 10:41:21 -04:00
Stephen J Day
a73eb2b2ce
api: generate merged descriptors when building protobufs
When we generate protobufs, descriptors outlining all messages and
services are merged into a single file that can be used to identify
unexpected changes to the API that may affect stability. We follow a
similar process to Go's stability guarantees using the protobuf
descriptors to identify changes before they become a problem.

Please see README.md for details.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-08-04 16:50:28 -07:00
Phil Estes
ff15d18f1f Merge pull request #1287 from crosbymichael/exitstatus
Return exit status from Wait of stopped process
2017-08-04 11:49:51 -04:00
Kenfe-Mickael Laventure
8700e23a10
Use root dir when storing temporary checkpoint data
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-03 14:38:18 -07:00
Michael Crosby
9f13b414b9 Return exit status from Wait of stopped process
This changes Wait() from returning an error whenever you call wait on a
stopped process/task to returning the exit status from the process.

This also adds the exit status to the Status() call on a process/task so
that a user can Wait(), check status, then cancel the wait to avoid
races in event handling.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-03 17:22:33 -04:00
Kenfe-Mickael Laventure
fbe4751b83
Use a temp socket to receive the console from runc
This greatly reduce the risk that we will hit the unix socket maximum path
length.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-03 10:44:10 -07:00
Kenfe-Mickael Laventure
642620cae3
Resurrect State directory
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-03 09:15:53 -07:00
Michael Crosby
bf4838eb71 Shutdown console after process exits
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 17:09:50 -04:00
Michael Crosby
d18af8699c Update for epoll console handling
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 13:50:08 -04:00
Michael Crosby
504033e373 Add Get of task and process state
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 13:50:08 -04:00
Michael Crosby
9f08965699 Change Exited/Status to SetExited/ExitStatus
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 13:50:08 -04:00
Michael Crosby
a2a3451925 Implement Exec + Start for tasks service
This splits up the exec creation and start in the tasks service

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 13:50:08 -04:00
Michael Crosby
63878d14ea Add create/start to exec processes in shim
This splits up the create and start of an exec process in the shim to
have two separate steps like the initial process.  This will allow
better state reporting for individual process along with a more robust
wait for execs.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-02 13:50:08 -04:00
Stephen J Day
7ed88c1e36
linux/shim: use events.Publisher interface
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2017-07-31 14:23:51 -07:00