Bump cadvisor & libcontainer

Fixes #15805
This commit is contained in:
Jimmi Dyson
2015-10-22 21:41:54 +01:00
parent f0e6ae4b8f
commit 38ad0738bf
141 changed files with 15832 additions and 673 deletions

82
Godeps/Godeps.json generated
View File

@@ -333,6 +333,11 @@
"Comment": "v1.4.1-4045-g2b27fe1",
"Rev": "2b27fe17a1b3fb8472fde96d768fa70996adf201"
},
{
"ImportPath": "github.com/docker/docker/pkg/symlink",
"Comment": "v1.4.1-4045-g2b27fe1",
"Rev": "2b27fe17a1b3fb8472fde96d768fa70996adf201"
},
{
"ImportPath": "github.com/docker/docker/pkg/term",
"Comment": "v1.4.1-4045-g2b27fe1",
@@ -350,8 +355,8 @@
},
{
"ImportPath": "github.com/docker/libcontainer",
"Comment": "v1.4.0-501-ga1fe3f1",
"Rev": "a1fe3f1c7ad2e8eebe6d59e573f04d2b10961cf6"
"Comment": "v2.2.1",
"Rev": "5dc7ba0f24332273461e45bc49edcb4d5aa6c44c"
},
{
"ImportPath": "github.com/docker/spdystream",
@@ -377,6 +382,7 @@
},
{
"ImportPath": "github.com/fsouza/go-dockerclient",
"Comment": "0.2.1-728-g1399676",
"Rev": "1399676f53e6ccf46e0bf00751b21bed329bc60e"
},
{
@@ -419,93 +425,93 @@
},
{
"ImportPath": "github.com/google/cadvisor/api",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/cache/memory",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/collector",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/container",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/events",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/fs",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/healthz",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/http",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/info/v1",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/info/v2",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/manager",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/metrics",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/pages",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/storage",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/summary",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/utils",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/validate",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/cadvisor/version",
"Comment": "0.19.0",
"Rev": "0d6015c74114de9503d68c3c91efceee4dc41bd2"
"Comment": "v0.19.2",
"Rev": "aa6f80814bc6fdb43a0ed12719658225420ffb7d"
},
{
"ImportPath": "github.com/google/gofuzz",

View File

@@ -0,0 +1,191 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
Copyright 2014-2015 Docker, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,27 @@
Copyright (c) 2014-2015 The Docker & Go Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,5 @@
Package symlink implements EvalSymlinksInScope which is an extension of filepath.EvalSymlinks
from the [Go standard library](https://golang.org/pkg/path/filepath).
The code from filepath.EvalSymlinks has been adapted in fs.go.
Please read the LICENSE.BSD file that governs fs.go and LICENSE.APACHE for fs_test.go.

View File

@@ -0,0 +1,131 @@
// Copyright 2012 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE.BSD file.
// This code is a modified version of path/filepath/symlink.go from the Go standard library.
package symlink
import (
"bytes"
"errors"
"os"
"path/filepath"
"strings"
)
// FollowSymlinkInScope is a wrapper around evalSymlinksInScope that returns an absolute path
func FollowSymlinkInScope(path, root string) (string, error) {
path, err := filepath.Abs(path)
if err != nil {
return "", err
}
root, err = filepath.Abs(root)
if err != nil {
return "", err
}
return evalSymlinksInScope(path, root)
}
// evalSymlinksInScope will evaluate symlinks in `path` within a scope `root` and return
// a result guaranteed to be contained within the scope `root`, at the time of the call.
// Symlinks in `root` are not evaluated and left as-is.
// Errors encountered while attempting to evaluate symlinks in path will be returned.
// Non-existing paths are valid and do not constitute an error.
// `path` has to contain `root` as a prefix, or else an error will be returned.
// Trying to break out from `root` does not constitute an error.
//
// Example:
// If /foo/bar -> /outside,
// FollowSymlinkInScope("/foo/bar", "/foo") == "/foo/outside" instead of "/oustide"
//
// IMPORTANT: it is the caller's responsibility to call evalSymlinksInScope *after* relevant symlinks
// are created and not to create subsequently, additional symlinks that could potentially make a
// previously-safe path, unsafe. Example: if /foo/bar does not exist, evalSymlinksInScope("/foo/bar", "/foo")
// would return "/foo/bar". If one makes /foo/bar a symlink to /baz subsequently, then "/foo/bar" should
// no longer be considered safely contained in "/foo".
func evalSymlinksInScope(path, root string) (string, error) {
root = filepath.Clean(root)
if path == root {
return path, nil
}
if !strings.HasPrefix(path, root) {
return "", errors.New("evalSymlinksInScope: " + path + " is not in " + root)
}
const maxIter = 255
originalPath := path
// given root of "/a" and path of "/a/b/../../c" we want path to be "/b/../../c"
path = path[len(root):]
if root == string(filepath.Separator) {
path = string(filepath.Separator) + path
}
if !strings.HasPrefix(path, string(filepath.Separator)) {
return "", errors.New("evalSymlinksInScope: " + path + " is not in " + root)
}
path = filepath.Clean(path)
// consume path by taking each frontmost path element,
// expanding it if it's a symlink, and appending it to b
var b bytes.Buffer
// b here will always be considered to be the "current absolute path inside
// root" when we append paths to it, we also append a slash and use
// filepath.Clean after the loop to trim the trailing slash
for n := 0; path != ""; n++ {
if n > maxIter {
return "", errors.New("evalSymlinksInScope: too many links in " + originalPath)
}
// find next path component, p
i := strings.IndexRune(path, filepath.Separator)
var p string
if i == -1 {
p, path = path, ""
} else {
p, path = path[:i], path[i+1:]
}
if p == "" {
continue
}
// this takes a b.String() like "b/../" and a p like "c" and turns it
// into "/b/../c" which then gets filepath.Cleaned into "/c" and then
// root gets prepended and we Clean again (to remove any trailing slash
// if the first Clean gave us just "/")
cleanP := filepath.Clean(string(filepath.Separator) + b.String() + p)
if cleanP == string(filepath.Separator) {
// never Lstat "/" itself
b.Reset()
continue
}
fullP := filepath.Clean(root + cleanP)
fi, err := os.Lstat(fullP)
if os.IsNotExist(err) {
// if p does not exist, accept it
b.WriteString(p)
b.WriteRune(filepath.Separator)
continue
}
if err != nil {
return "", err
}
if fi.Mode()&os.ModeSymlink == 0 {
b.WriteString(p + string(filepath.Separator))
continue
}
// it's a symlink, put it at the front of path
dest, err := os.Readlink(fullP)
if err != nil {
return "", err
}
if filepath.IsAbs(dest) {
b.Reset()
}
path = dest + string(filepath.Separator) + path
}
// see note above on "fullP := ..." for why this is double-cleaned and
// what's happening here
return filepath.Clean(root + filepath.Clean(string(filepath.Separator)+b.String())), nil
}

View File

@@ -1,2 +1,3 @@
bundles
nsinit/nsinit
vendor/pkg

View File

@@ -1,5 +1,8 @@
FROM golang:1.4
RUN echo "deb http://ftp.us.debian.org/debian testing main contrib" >> /etc/apt/sources.list
RUN apt-get update && apt-get install -y iptables criu=1.5.2-1 && rm -rf /var/lib/apt/lists/*
RUN go get golang.org/x/tools/cmd/cover
ENV GOPATH $GOPATH:/go/src/github.com/docker/libcontainer/vendor
@@ -16,7 +19,6 @@ COPY . /go/src/github.com/docker/libcontainer
WORKDIR /go/src/github.com/docker/libcontainer
RUN cp sample_configs/minimal.json /busybox/container.json
RUN go get -d -v ./...
RUN make direct-install
ENTRYPOINT ["/dind"]

View File

@@ -1,13 +1,13 @@
## libcontainer - reference implementation for containers [![Build Status](https://jenkins.dockerproject.com/buildStatus/icon?job=Libcontainer Master)](https://jenkins.dockerproject.com/job/Libcontainer%20Master/)
## libcontainer - reference implementation for containers [![Build Status](https://jenkins.dockerproject.org/buildStatus/icon?job=Libcontainer%20Master)](https://jenkins.dockerproject.org/job/Libcontainer%20Master/)
Libcontainer provides a native Go implementation for creating containers
Libcontainer provides a native Go implementation for creating containers
with namespaces, cgroups, capabilities, and filesystem access controls.
It allows you to manage the lifecycle of the container performing additional operations
after the container is created.
#### Container
A container is a self contained execution environment that shares the kernel of the
A container is a self contained execution environment that shares the kernel of the
host system and which is (optionally) isolated from other containers in the system.
#### Using libcontainer
@@ -27,7 +27,7 @@ if err != nil {
}
```
Once you have an instance of the factory created we can create a configuration
Once you have an instance of the factory created we can create a configuration
struct describing how the container is to be created. A sample would look similar to this:
```go
@@ -120,9 +120,9 @@ Additional ways to interact with a running container are:
```go
// return all the pids for all processes running inside the container.
processes, err := container.Processes()
processes, err := container.Processes()
// get detailed cpu, memory, io, and network statistics for the container and
// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()
@@ -137,18 +137,18 @@ container.Resume()
#### nsinit
`nsinit` is a cli application which demonstrates the use of libcontainer.
`nsinit` is a cli application which demonstrates the use of libcontainer.
It is able to spawn new containers or join existing containers. A root
filesystem must be provided for use along with a container configuration file.
To build `nsinit`, run `make binary`. It will save the binary into
`bundles/nsinit`.
To use `nsinit`, cd into a Linux rootfs and copy a `container.json` file into
the directory with your specified configuration. Environment, networking,
and different capabilities for the container are specified in this file.
To use `nsinit`, cd into a Linux rootfs and copy a `container.json` file into
the directory with your specified configuration. Environment, networking,
and different capabilities for the container are specified in this file.
The configuration is used for each process executed inside the container.
See the `sample_configs` folder for examples of what the container configuration should look like.
To execute `/bin/bash` in the current directory as a container just run the following **as root**:
@@ -156,18 +156,50 @@ To execute `/bin/bash` in the current directory as a container just run the foll
nsinit exec --tty /bin/bash
```
If you wish to spawn another process inside the container while your
current bash session is running, run the same command again to
get another bash shell (or change the command). If the original
process (PID 1) dies, all other processes spawned inside the container
will be killed and the namespace will be removed.
If you wish to spawn another process inside the container while your
current bash session is running, run the same command again to
get another bash shell (or change the command). If the original
process (PID 1) dies, all other processes spawned inside the container
will be killed and the namespace will be removed.
You can identify if a process is running in a container by
You can identify if a process is running in a container by
looking to see if `state.json` is in the root of the directory.
You may also specify an alternate root place where
You may also specify an alternate root place where
the `container.json` file is read and where the `state.json` file will be saved.
#### Checkpoint & Restore
libcontainer now integrates [CRIU](http://criu.org/) for checkpointing and restoring containers.
This let's you save the state of a process running inside a container to disk, and then restore
that state into a new process, on the same machine or on another machine.
`criu` version 1.5.2 or higher is required to use checkpoint and restore.
If you don't already have `criu` installed, you can build it from source, following the
[online instructions](http://criu.org/Installation). `criu` is also installed in the docker image
generated when building libcontainer with docker.
To try an example with `nsinit`, open two terminals to the same busybox directory.
In the first terminal, run a command like this one:
```bash
nsinit exec -- sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
```
You should see logs printing to the terminal every second. Now, in the second terminal, run:
```bash
nsinit checkpoint --image-path=/tmp/criu
```
The logs in your first terminal will stop and the process will exit. Finally, in the second
terminal, run the restore command:
```bash
nsinit restore --image-path=/tmp/criu
```
The process will resume counting where it left off and printing to the new terminal window.
#### Future
See the [roadmap](ROADMAP.md).

View File

@@ -15,7 +15,7 @@ with a strong security configuration.
### System Requirements and Compatibility
Minimum requirements:
* Kernel version - 3.8 recommended 2.6.2x minimum(with backported patches)
* Kernel version - 3.10 recommended 2.6.2x minimum(with backported patches)
* Mounted cgroups with each subsystem in its own hierarchy
@@ -28,11 +28,9 @@ Minimum requirements:
| CLONE_NEWIPC | 1 |
| CLONE_NEWNET | 1 |
| CLONE_NEWNS | 1 |
| CLONE_NEWUSER | 0 |
| CLONE_NEWUSER | 1 |
In v1 the user namespace is not enabled by default for support of older kernels
where the user namespace feature is not fully implemented. Namespaces are
created for the container via the `clone` syscall.
Namespaces are created for the container via the `clone` syscall.
### Filesystem
@@ -49,14 +47,14 @@ unmount all the mounts that were setup within that namespace.
For a container to execute properly there are certain filesystems that
are required to be mounted within the rootfs that the runtime will setup.
| Path | Type | Flags | Data |
| ----------- | ------ | -------------------------------------- | --------------------------------------- |
| /proc | proc | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev | tmpfs | MS_NOEXEC,MS_STRICTATIME | mode=755 |
| /dev/shm | shm | MS_NOEXEC,MS_NOSUID,MS_NODEV | mode=1777,size=65536k |
| /dev/mqueue | mqueue | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev/pts | devpts | MS_NOEXEC,MS_NOSUID | newinstance,ptmxmode=0666,mode=620,gid5 |
| /sys | sysfs | MS_NOEXEC,MS_NOSUID,MS_NODEV,MS_RDONLY | |
| Path | Type | Flags | Data |
| ----------- | ------ | -------------------------------------- | ---------------------------------------- |
| /proc | proc | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev | tmpfs | MS_NOEXEC,MS_STRICTATIME | mode=755 |
| /dev/shm | tmpfs | MS_NOEXEC,MS_NOSUID,MS_NODEV | mode=1777,size=65536k |
| /dev/mqueue | mqueue | MS_NOEXEC,MS_NOSUID,MS_NODEV | |
| /dev/pts | devpts | MS_NOEXEC,MS_NOSUID | newinstance,ptmxmode=0666,mode=620,gid=5 |
| /sys | sysfs | MS_NOEXEC,MS_NOSUID,MS_NODEV,MS_RDONLY | |
After a container's filesystems are mounted within the newly created
@@ -143,6 +141,7 @@ system resources like cpu, memory, and device access.
| blkio | 1 |
| perf_event | 1 |
| freezer | 1 |
| hugetlb | 1 |
All cgroup subsystem are joined so that statistics can be collected from
@@ -165,6 +164,7 @@ provide a good default for security and flexibility for the applications.
| -------------------- | ------- |
| CAP_NET_RAW | 1 |
| CAP_NET_BIND_SERVICE | 1 |
| CAP_AUDIT_READ | 1 |
| CAP_AUDIT_WRITE | 1 |
| CAP_DAC_OVERRIDE | 1 |
| CAP_SETFCAP | 1 |
@@ -217,17 +217,6 @@ profile <profile_name> flags=(attach_disconnected,mediate_deleted) {
file,
umount,
mount fstype=tmpfs,
mount fstype=mqueue,
mount fstype=fuse.*,
mount fstype=binfmt_misc -> /proc/sys/fs/binfmt_misc/,
mount fstype=efivarfs -> /sys/firmware/efi/efivars/,
mount fstype=fusectl -> /sys/fs/fuse/connections/,
mount fstype=securityfs -> /sys/kernel/security/,
mount fstype=debugfs -> /sys/kernel/debug/,
mount fstype=proc -> /proc/,
mount fstype=sysfs -> /sys/,
deny @{PROC}/sys/fs/** wklx,
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/mem rwklx,
@@ -235,9 +224,7 @@ profile <profile_name> flags=(attach_disconnected,mediate_deleted) {
deny @{PROC}/sys/kernel/[^s][^h][^m]* wklx,
deny @{PROC}/sys/kernel/*/** wklx,
deny mount options=(ro, remount) -> /,
deny mount fstype=debugfs -> /var/lib/ureadahead/debugfs/,
deny mount fstype=devpts,
deny mount,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
@@ -317,6 +304,7 @@ a container.
| Pause | Pause all processes inside the container |
| Resume | Resume all processes inside the container if paused |
| Exec | Execute a new process inside of the container ( requires setns ) |
| Set | Setup configs of the container after it's created |
### Execute a new process inside of a running container.

View File

@@ -1,3 +1,5 @@
// +build linux
package apparmor
import (
@@ -27,17 +29,6 @@ profile {{.Name}} flags=(attach_disconnected,mediate_deleted) {
file,
umount,
mount fstype=tmpfs,
mount fstype=mqueue,
mount fstype=fuse.*,
mount fstype=binfmt_misc -> /proc/sys/fs/binfmt_misc/,
mount fstype=efivarfs -> /sys/firmware/efi/efivars/,
mount fstype=fusectl -> /sys/fs/fuse/connections/,
mount fstype=securityfs -> /sys/kernel/security/,
mount fstype=debugfs -> /sys/kernel/debug/,
mount fstype=proc -> /proc/,
mount fstype=sysfs -> /sys/,
deny @{PROC}/sys/fs/** wklx,
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/mem rwklx,
@@ -45,9 +36,7 @@ profile {{.Name}} flags=(attach_disconnected,mediate_deleted) {
deny @{PROC}/sys/kernel/[^s][^h][^m]* wklx,
deny @{PROC}/sys/kernel/*/** wklx,
deny mount options=(ro, remount) -> /,
deny mount fstype=debugfs -> /var/lib/ureadahead/debugfs/,
deny mount fstype=devpts,
deny mount,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,

View File

@@ -1,3 +1,5 @@
// +build linux
package apparmor
import (

View File

@@ -1,3 +1,5 @@
// +build linux
package cgroups
import (

View File

@@ -0,0 +1,3 @@
// +build !linux
package cgroups

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (
@@ -22,10 +24,13 @@ var (
"cpuacct": &CpuacctGroup{},
"blkio": &BlkioGroup{},
"hugetlb": &HugetlbGroup{},
"net_cls": &NetClsGroup{},
"net_prio": &NetPrioGroup{},
"perf_event": &PerfEventGroup{},
"freezer": &FreezerGroup{},
}
CgroupProcesses = "cgroup.procs"
HugePageSizes, _ = cgroups.GetHugePageSize()
)
type subsystem interface {
@@ -40,6 +45,7 @@ type subsystem interface {
}
type Manager struct {
mu sync.Mutex
Cgroups *configs.Cgroup
Paths map[string]string
}
@@ -78,7 +84,6 @@ type data struct {
}
func (m *Manager) Apply(pid int) error {
if m.Cgroups == nil {
return nil
}
@@ -124,14 +129,25 @@ func (m *Manager) Apply(pid int) error {
}
func (m *Manager) Destroy() error {
return cgroups.RemovePaths(m.Paths)
m.mu.Lock()
defer m.mu.Unlock()
if err := cgroups.RemovePaths(m.Paths); err != nil {
return err
}
m.Paths = make(map[string]string)
return nil
}
func (m *Manager) GetPaths() map[string]string {
return m.Paths
m.mu.Lock()
paths := m.Paths
m.mu.Unlock()
return paths
}
func (m *Manager) GetStats() (*cgroups.Stats, error) {
m.mu.Lock()
defer m.mu.Unlock()
stats := cgroups.NewStats()
for name, path := range m.Paths {
sys, ok := subsystems[name]
@@ -262,6 +278,11 @@ func (raw *data) join(subsystem string) (string, error) {
}
func writeFile(dir, file, data string) error {
// Normally dir should not be empty, one case is that cgroup subsystem
// is not mounted, we will get empty dir, and we want it fail here.
if dir == "" {
return fmt.Errorf("no such directory for %s.", file)
}
return ioutil.WriteFile(filepath.Join(dir, file), []byte(data), 0700)
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (
@@ -17,7 +19,7 @@ func (s *CpuGroup) Apply(d *data) error {
// We always want to join the cpu group, to allow fair cpu scheduling
// on a container basis
dir, err := d.join("cpu")
if err != nil {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
@@ -44,6 +46,16 @@ func (s *CpuGroup) Set(path string, cgroup *configs.Cgroup) error {
return err
}
}
if cgroup.CpuRtPeriod != 0 {
if err := writeFile(path, "cpu.rt_period_us", strconv.FormatInt(cgroup.CpuRtPeriod, 10)); err != nil {
return err
}
}
if cgroup.CpuRtRuntime != 0 {
if err := writeFile(path, "cpu.rt_runtime_us", strconv.FormatInt(cgroup.CpuRtRuntime, 10)); err != nil {
return err
}
}
return nil
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (
@@ -16,7 +18,7 @@ type CpusetGroup struct {
func (s *CpusetGroup) Apply(d *data) error {
dir, err := d.path("cpuset")
if err != nil {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
@@ -48,6 +50,11 @@ func (s *CpusetGroup) GetStats(path string, stats *cgroups.Stats) error {
}
func (s *CpusetGroup) ApplyDir(dir string, cgroup *configs.Cgroup, pid int) error {
// This might happen if we have no cpuset cgroup mounted.
// Just do nothing and don't fail.
if dir == "" {
return nil
}
if err := s.ensureParent(dir); err != nil {
return err
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (
@@ -11,6 +13,8 @@ type DevicesGroup struct {
func (s *DevicesGroup) Apply(d *data) error {
dir, err := d.join("devices")
if err != nil {
// We will return error even it's `not found` error, devices
// cgroup is hard requirement for container's security.
return err
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (

View File

@@ -0,0 +1,3 @@
// +build !linux
package fs

View File

@@ -1,6 +1,12 @@
// +build linux
package fs
import (
"fmt"
"strconv"
"strings"
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
)
@@ -9,14 +15,25 @@ type HugetlbGroup struct {
}
func (s *HugetlbGroup) Apply(d *data) error {
// we just want to join this group even though we don't set anything
if _, err := d.join("hugetlb"); err != nil && !cgroups.IsNotFound(err) {
dir, err := d.join("hugetlb")
if err != nil && !cgroups.IsNotFound(err) {
return err
}
if err := s.Set(dir, d.c); err != nil {
return err
}
return nil
}
func (s *HugetlbGroup) Set(path string, cgroup *configs.Cgroup) error {
for _, hugetlb := range cgroup.HugetlbLimit {
if err := writeFile(path, strings.Join([]string{"hugetlb", hugetlb.Pagesize, "limit_in_bytes"}, "."), strconv.Itoa(hugetlb.Limit)); err != nil {
return err
}
}
return nil
}
@@ -25,5 +42,31 @@ func (s *HugetlbGroup) Remove(d *data) error {
}
func (s *HugetlbGroup) GetStats(path string, stats *cgroups.Stats) error {
hugetlbStats := cgroups.HugetlbStats{}
for _, pageSize := range HugePageSizes {
usage := strings.Join([]string{"hugetlb", pageSize, "usage_in_bytes"}, ".")
value, err := getCgroupParamUint(path, usage)
if err != nil {
return fmt.Errorf("failed to parse %s - %v", usage, err)
}
hugetlbStats.Usage = value
maxUsage := strings.Join([]string{"hugetlb", pageSize, "max_usage_in_bytes"}, ".")
value, err = getCgroupParamUint(path, maxUsage)
if err != nil {
return fmt.Errorf("failed to parse %s - %v", maxUsage, err)
}
hugetlbStats.MaxUsage = value
failcnt := strings.Join([]string{"hugetlb", pageSize, "failcnt"}, ".")
value, err = getCgroupParamUint(path, failcnt)
if err != nil {
return fmt.Errorf("failed to parse %s - %v", failcnt, err)
}
hugetlbStats.Failcnt = value
stats.HugetlbStats[pageSize] = hugetlbStats
}
return nil
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (
@@ -6,6 +8,7 @@ import (
"os"
"path/filepath"
"strconv"
"strings"
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
@@ -16,8 +19,7 @@ type MemoryGroup struct {
func (s *MemoryGroup) Apply(d *data) error {
dir, err := d.join("memory")
// only return an error for memory if it was specified
if err != nil && (d.c.Memory != 0 || d.c.MemoryReservation != 0 || d.c.MemorySwap != 0) {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
defer func() {
@@ -44,14 +46,13 @@ func (s *MemoryGroup) Set(path string, cgroup *configs.Cgroup) error {
return err
}
}
// By default, MemorySwap is set to twice the size of Memory.
if cgroup.MemorySwap == 0 && cgroup.Memory != 0 {
if err := writeFile(path, "memory.memsw.limit_in_bytes", strconv.FormatInt(cgroup.Memory*2, 10)); err != nil {
if cgroup.MemorySwap > 0 {
if err := writeFile(path, "memory.memsw.limit_in_bytes", strconv.FormatInt(cgroup.MemorySwap, 10)); err != nil {
return err
}
}
if cgroup.MemorySwap > 0 {
if err := writeFile(path, "memory.memsw.limit_in_bytes", strconv.FormatInt(cgroup.MemorySwap, 10)); err != nil {
if cgroup.KernelMemory > 0 {
if err := writeFile(path, "memory.kmem.limit_in_bytes", strconv.FormatInt(cgroup.KernelMemory, 10)); err != nil {
return err
}
}
@@ -61,6 +62,11 @@ func (s *MemoryGroup) Set(path string, cgroup *configs.Cgroup) error {
return err
}
}
if cgroup.MemorySwappiness >= 0 && cgroup.MemorySwappiness <= 100 {
if err := writeFile(path, "memory.swappiness", strconv.FormatInt(cgroup.MemorySwappiness, 10)); err != nil {
return err
}
}
return nil
}
@@ -88,24 +94,62 @@ func (s *MemoryGroup) GetStats(path string, stats *cgroups.Stats) error {
}
stats.MemoryStats.Stats[t] = v
}
// Set memory usage and max historical usage.
value, err := getCgroupParamUint(path, "memory.usage_in_bytes")
if err != nil {
return fmt.Errorf("failed to parse memory.usage_in_bytes - %v", err)
}
stats.MemoryStats.Usage = value
stats.MemoryStats.Cache = stats.MemoryStats.Stats["cache"]
value, err = getCgroupParamUint(path, "memory.max_usage_in_bytes")
memoryUsage, err := getMemoryData(path, "")
if err != nil {
return fmt.Errorf("failed to parse memory.max_usage_in_bytes - %v", err)
return err
}
stats.MemoryStats.MaxUsage = value
value, err = getCgroupParamUint(path, "memory.failcnt")
stats.MemoryStats.Usage = memoryUsage
swapUsage, err := getMemoryData(path, "memsw")
if err != nil {
return fmt.Errorf("failed to parse memory.failcnt - %v", err)
return err
}
stats.MemoryStats.Failcnt = value
stats.MemoryStats.SwapUsage = swapUsage
kernelUsage, err := getMemoryData(path, "kmem")
if err != nil {
return err
}
stats.MemoryStats.KernelUsage = kernelUsage
return nil
}
func getMemoryData(path, name string) (cgroups.MemoryData, error) {
memoryData := cgroups.MemoryData{}
moduleName := "memory"
if name != "" {
moduleName = strings.Join([]string{"memory", name}, ".")
}
usage := strings.Join([]string{moduleName, "usage_in_bytes"}, ".")
maxUsage := strings.Join([]string{moduleName, "max_usage_in_bytes"}, ".")
failcnt := strings.Join([]string{moduleName, "failcnt"}, ".")
value, err := getCgroupParamUint(path, usage)
if err != nil {
if moduleName != "memory" && os.IsNotExist(err) {
return cgroups.MemoryData{}, nil
}
return cgroups.MemoryData{}, fmt.Errorf("failed to parse %s - %v", usage, err)
}
memoryData.Usage = value
value, err = getCgroupParamUint(path, maxUsage)
if err != nil {
if moduleName != "memory" && os.IsNotExist(err) {
return cgroups.MemoryData{}, nil
}
return cgroups.MemoryData{}, fmt.Errorf("failed to parse %s - %v", maxUsage, err)
}
memoryData.MaxUsage = value
value, err = getCgroupParamUint(path, failcnt)
if err != nil {
if moduleName != "memory" && os.IsNotExist(err) {
return cgroups.MemoryData{}, nil
}
return cgroups.MemoryData{}, fmt.Errorf("failed to parse %s - %v", failcnt, err)
}
memoryData.Failcnt = value
return memoryData, nil
}

View File

@@ -0,0 +1,40 @@
package fs
import (
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
)
type NetClsGroup struct {
}
func (s *NetClsGroup) Apply(d *data) error {
dir, err := d.join("net_cls")
if err != nil && !cgroups.IsNotFound(err) {
return err
}
if err := s.Set(dir, d.c); err != nil {
return err
}
return nil
}
func (s *NetClsGroup) Set(path string, cgroup *configs.Cgroup) error {
if cgroup.NetClsClassid != "" {
if err := writeFile(path, "net_cls.classid", cgroup.NetClsClassid); err != nil {
return err
}
}
return nil
}
func (s *NetClsGroup) Remove(d *data) error {
return removePath(d.path("net_cls"))
}
func (s *NetClsGroup) GetStats(path string, stats *cgroups.Stats) error {
return nil
}

View File

@@ -0,0 +1,40 @@
package fs
import (
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
)
type NetPrioGroup struct {
}
func (s *NetPrioGroup) Apply(d *data) error {
dir, err := d.join("net_prio")
if err != nil && !cgroups.IsNotFound(err) {
return err
}
if err := s.Set(dir, d.c); err != nil {
return err
}
return nil
}
func (s *NetPrioGroup) Set(path string, cgroup *configs.Cgroup) error {
for _, prioMap := range cgroup.NetPrioIfpriomap {
if err := writeFile(path, "net_prio.ifpriomap", prioMap.CgroupString()); err != nil {
return err
}
}
return nil
}
func (s *NetPrioGroup) Remove(d *data) error {
return removePath(d.path("net_prio"))
}
func (s *NetPrioGroup) GetStats(path string, stats *cgroups.Stats) error {
return nil
}

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (

View File

@@ -1,3 +1,5 @@
// +build linux
package fs
import (

View File

@@ -1,3 +1,5 @@
// +build linux
package cgroups
type ThrottlingData struct {
@@ -30,18 +32,21 @@ type CpuStats struct {
ThrottlingData ThrottlingData `json:"throttling_data,omitempty"`
}
type MemoryData struct {
Usage uint64 `json:"usage,omitempty"`
MaxUsage uint64 `json:"max_usage,omitempty"`
Failcnt uint64 `json:"failcnt"`
}
type MemoryStats struct {
// current res_counter usage for memory
Usage uint64 `json:"usage,omitempty"`
// memory used for cache
Cache uint64 `json:"cache,omitempty"`
// maximum usage ever recorded.
MaxUsage uint64 `json:"max_usage,omitempty"`
// TODO(vishh): Export these as stronger types.
// all the stats exported via memory.stat.
Stats map[string]uint64 `json:"stats,omitempty"`
// number of times memory usage hits limits.
Failcnt uint64 `json:"failcnt"`
// usage of memory
Usage MemoryData `json:"usage,omitempty"`
// usage of memory + swap
SwapUsage MemoryData `json:"swap_usage,omitempty"`
// usafe of kernel memory
KernelUsage MemoryData `json:"kernel_usage,omitempty"`
Stats map[string]uint64 `json:"stats,omitempty"`
}
type BlkioStatEntry struct {
@@ -63,13 +68,25 @@ type BlkioStats struct {
SectorsRecursive []BlkioStatEntry `json:"sectors_recursive,omitempty"`
}
type HugetlbStats struct {
// current res_counter usage for hugetlb
Usage uint64 `json:"usage,omitempty"`
// maximum usage ever recorded.
MaxUsage uint64 `json:"max_usage,omitempty"`
// number of times htgetlb usage allocation failure.
Failcnt uint64 `json:"failcnt"`
}
type Stats struct {
CpuStats CpuStats `json:"cpu_stats,omitempty"`
MemoryStats MemoryStats `json:"memory_stats,omitempty"`
BlkioStats BlkioStats `json:"blkio_stats,omitempty"`
// the map is in the format "size of hugepage: stats of the hugepage"
HugetlbStats map[string]HugetlbStats `json:"hugetlb_stats,omitempty"`
}
func NewStats() *Stats {
memoryStats := MemoryStats{Stats: make(map[string]uint64)}
return &Stats{MemoryStats: memoryStats}
hugetlbStats := make(map[string]HugetlbStats)
return &Stats{MemoryStats: memoryStats, HugetlbStats: hugetlbStats}
}

View File

@@ -20,6 +20,7 @@ import (
)
type Manager struct {
mu sync.Mutex
Cgroups *configs.Cgroup
Paths map[string]string
}
@@ -41,6 +42,8 @@ var subsystems = map[string]subsystem{
"hugetlb": &fs.HugetlbGroup{},
"perf_event": &fs.PerfEventGroup{},
"freezer": &fs.FreezerGroup{},
"net_prio": &fs.NetPrioGroup{},
"net_cls": &fs.NetClsGroup{},
}
const (
@@ -199,24 +202,30 @@ func (m *Manager) Apply(pid int) error {
return err
}
// -1 disables memorySwap
if c.MemorySwap >= 0 && c.Memory != 0 {
if err := joinMemory(c, pid); err != nil {
return err
}
if err := joinMemory(c, pid); err != nil {
return err
}
// we need to manually join the freezer and cpuset cgroup in systemd
// we need to manually join the freezer, net_cls, net_prio and cpuset cgroup in systemd
// because it does not currently support it via the dbus api.
if err := joinFreezer(c, pid); err != nil {
return err
}
if err := joinNetPrio(c, pid); err != nil {
return err
}
if err := joinNetCls(c, pid); err != nil {
return err
}
if err := joinCpuset(c, pid); err != nil {
return err
}
if err := joinHugetlb(c, pid); err != nil {
return err
}
// FIXME: Systemd does have `BlockIODeviceWeight` property, but we got problem
// using that (at least on systemd 208, see https://github.com/docker/libcontainer/pull/354),
// so use fs work around for now.
@@ -248,14 +257,29 @@ func (m *Manager) Apply(pid int) error {
}
func (m *Manager) Destroy() error {
return cgroups.RemovePaths(m.Paths)
m.mu.Lock()
defer m.mu.Unlock()
theConn.StopUnit(getUnitName(m.Cgroups), "replace")
if err := cgroups.RemovePaths(m.Paths); err != nil {
return err
}
m.Paths = make(map[string]string)
return nil
}
func (m *Manager) GetPaths() map[string]string {
return m.Paths
m.mu.Lock()
paths := m.Paths
m.mu.Unlock()
return paths
}
func writeFile(dir, file, data string) error {
// Normally dir should not be empty, one case is that cgroup subsystem
// is not mounted, we will get empty dir, and we want it fail here.
if dir == "" {
return fmt.Errorf("no such directory for %s.", file)
}
return ioutil.WriteFile(filepath.Join(dir, file), []byte(data), 0700)
}
@@ -276,28 +300,61 @@ func join(c *configs.Cgroup, subsystem string, pid int) (string, error) {
func joinCpu(c *configs.Cgroup, pid int) error {
path, err := getSubsystemPath(c, "cpu")
if err != nil {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
if c.CpuQuota != 0 {
if err = ioutil.WriteFile(filepath.Join(path, "cpu.cfs_quota_us"), []byte(strconv.FormatInt(c.CpuQuota, 10)), 0700); err != nil {
if err = writeFile(path, "cpu.cfs_quota_us", strconv.FormatInt(c.CpuQuota, 10)); err != nil {
return err
}
}
if c.CpuPeriod != 0 {
if err = ioutil.WriteFile(filepath.Join(path, "cpu.cfs_period_us"), []byte(strconv.FormatInt(c.CpuPeriod, 10)), 0700); err != nil {
if err = writeFile(path, "cpu.cfs_period_us", strconv.FormatInt(c.CpuPeriod, 10)); err != nil {
return err
}
}
if c.CpuRtPeriod != 0 {
if err = writeFile(path, "cpu.rt_period_us", strconv.FormatInt(c.CpuRtPeriod, 10)); err != nil {
return err
}
}
if c.CpuRtRuntime != 0 {
if err = writeFile(path, "cpu.rt_runtime_us", strconv.FormatInt(c.CpuRtRuntime, 10)); err != nil {
return err
}
}
return nil
}
func joinFreezer(c *configs.Cgroup, pid int) error {
if _, err := join(c, "freezer", pid); err != nil {
path, err := join(c, "freezer", pid)
if err != nil && !cgroups.IsNotFound(err) {
return err
}
return nil
freezer := subsystems["freezer"]
return freezer.Set(path, c)
}
func joinNetPrio(c *configs.Cgroup, pid int) error {
path, err := join(c, "net_prio", pid)
if err != nil && !cgroups.IsNotFound(err) {
return err
}
netPrio := subsystems["net_prio"]
return netPrio.Set(path, c)
}
func joinNetCls(c *configs.Cgroup, pid int) error {
path, err := join(c, "net_cls", pid)
if err != nil && !cgroups.IsNotFound(err) {
return err
}
netcls := subsystems["net_cls"]
return netcls.Set(path, c)
}
func getSubsystemPath(c *configs.Cgroup, subsystem string) (string, error) {
@@ -348,6 +405,8 @@ func (m *Manager) GetPids() ([]int, error) {
}
func (m *Manager) GetStats() (*cgroups.Stats, error) {
m.mu.Lock()
defer m.mu.Unlock()
stats := cgroups.NewStats()
for name, path := range m.Paths {
sys, ok := subsystems[name]
@@ -393,6 +452,8 @@ func getUnitName(c *configs.Cgroup) string {
// This happens at least for v208 when any sibling unit is started.
func joinDevices(c *configs.Cgroup, pid int) error {
path, err := join(c, "devices", pid)
// Even if it's `not found` error, we'll return err because devices cgroup
// is hard requirement for container security.
if err != nil {
return err
}
@@ -402,19 +463,33 @@ func joinDevices(c *configs.Cgroup, pid int) error {
}
func joinMemory(c *configs.Cgroup, pid int) error {
memorySwap := c.MemorySwap
if memorySwap == 0 {
// By default, MemorySwap is set to twice the size of RAM.
memorySwap = c.Memory * 2
}
path, err := getSubsystemPath(c, "memory")
if err != nil {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
return ioutil.WriteFile(filepath.Join(path, "memory.memsw.limit_in_bytes"), []byte(strconv.FormatInt(memorySwap, 10)), 0700)
// -1 disables memoryswap
if c.MemorySwap > 0 {
err = writeFile(path, "memory.memsw.limit_in_bytes", strconv.FormatInt(c.MemorySwap, 10))
if err != nil {
return err
}
}
if c.KernelMemory > 0 {
err = writeFile(path, "memory.kmem.limit_in_bytes", strconv.FormatInt(c.KernelMemory, 10))
if err != nil {
return err
}
}
if c.MemorySwappiness >= 0 && c.MemorySwappiness <= 100 {
err = writeFile(path, "memory.swappiness", strconv.FormatInt(c.MemorySwappiness, 10))
if err != nil {
return err
}
}
return nil
}
// systemd does not atm set up the cpuset controller, so we must manually
@@ -422,7 +497,7 @@ func joinMemory(c *configs.Cgroup, pid int) error {
// level must have a full setup as the default for a new directory is "no cpus"
func joinCpuset(c *configs.Cgroup, pid int) error {
path, err := getSubsystemPath(c, "cpuset")
if err != nil {
if err != nil && !cgroups.IsNotFound(err) {
return err
}
@@ -467,3 +542,13 @@ func joinBlkio(c *configs.Cgroup, pid int) error {
return nil
}
func joinHugetlb(c *configs.Cgroup, pid int) error {
path, err := join(c, "hugetlb", pid)
if err != nil && !cgroups.IsNotFound(err) {
return err
}
hugetlb := subsystems["hugetlb"]
return hugetlb.Set(path, c)
}

View File

@@ -1,3 +1,5 @@
// +build linux
package cgroups
import (
@@ -12,24 +14,28 @@ import (
"time"
"github.com/docker/docker/pkg/mount"
"github.com/docker/docker/pkg/units"
)
// https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
func FindCgroupMountpoint(subsystem string) (string, error) {
mounts, err := mount.GetMounts()
f, err := os.Open("/proc/self/mountinfo")
if err != nil {
return "", err
}
for _, mount := range mounts {
if mount.Fstype == "cgroup" {
for _, opt := range strings.Split(mount.VfsOpts, ",") {
if opt == subsystem {
return mount.Mountpoint, nil
}
scanner := bufio.NewScanner(f)
for scanner.Scan() {
txt := scanner.Text()
fields := strings.Split(txt, " ")
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if opt == subsystem {
return fields[4], nil
}
}
}
if err := scanner.Err(); err != nil {
return "", err
}
return "", NewNotFoundError(subsystem)
}
@@ -236,3 +242,23 @@ func RemovePaths(paths map[string]string) (err error) {
}
return fmt.Errorf("Failed to remove paths: %s", paths)
}
func GetHugePageSize() ([]string, error) {
var pageSizes []string
sizeList := []string{"B", "kB", "MB", "GB", "TB", "PB"}
files, err := ioutil.ReadDir("/sys/kernel/mm/hugepages")
if err != nil {
return pageSizes, err
}
for _, st := range files {
nameArray := strings.Split(st.Name(), "-")
pageSize, err := units.RAMInBytes(nameArray[1])
if err != nil {
return []string{}, err
}
sizeString := units.CustomSize("%g%s", float64(pageSize), 1024.0, sizeList)
pageSizes = append(pageSizes, sizeString)
}
return pageSizes, nil
}

View File

@@ -8,6 +8,9 @@ const (
Thawed FreezerState = "THAWED"
)
// TODO Windows: This can be factored out in the future as Cgroups are not
// supported on the Windows platform.
type Cgroup struct {
Name string `json:"name"`
@@ -30,6 +33,9 @@ type Cgroup struct {
// Total memory usage (memory + swap); set `-1' to disable swap
MemorySwap int64 `json:"memory_swap"`
// Kernel memory limit (in bytes)
KernelMemory int64 `json:"kernel_memory"`
// CPU shares (relative weight vs. other containers)
CpuShares int64 `json:"cpu_shares"`
@@ -39,6 +45,12 @@ type Cgroup struct {
// CPU period to be used for hardcapping (in usecs). 0 to use system default.
CpuPeriod int64 `json:"cpu_period"`
// How many time CPU will use in realtime scheduling (in usecs).
CpuRtRuntime int64 `json:"cpu_quota"`
// CPU period to be used for realtime scheduling (in usecs).
CpuRtPeriod int64 `json:"cpu_period"`
// CPU to use
CpusetCpus string `json:"cpuset_cpus"`
@@ -66,9 +78,21 @@ type Cgroup struct {
// set the freeze value for the process
Freezer FreezerState `json:"freezer"`
// Hugetlb limit (in bytes)
HugetlbLimit []*HugepageLimit `json:"hugetlb_limit"`
// Parent slice to use for systemd TODO: remove in favor or parent
Slice string `json:"slice"`
// Whether to disable OOM Killer
OomKillDisable bool `json:"oom_kill_disable"`
// Tuning swappiness behaviour per cgroup
MemorySwappiness int64 `json:"memory_swappiness"`
// Set priority of network traffic for container
NetPrioIfpriomap []*IfPrioMap `json:"net_prio_ifpriomap"`
// Set class identifier for container's network packets
NetClsClassid string `json:"net_cls_classid"`
}

View File

@@ -1,7 +1,5 @@
package configs
import "fmt"
type Rlimit struct {
Type int `json:"type"`
Hard uint64 `json:"hard"`
@@ -15,6 +13,43 @@ type IDMap struct {
Size int `json:"size"`
}
type Seccomp struct {
Syscalls []*Syscall `json:"syscalls"`
}
type Action int
const (
Kill Action = iota - 3
Trap
Allow
)
type Operator int
const (
EqualTo Operator = iota
NotEqualTo
GreatherThan
LessThan
MaskEqualTo
)
type Arg struct {
Index int `json:"index"`
Value uint32 `json:"value"`
Op Operator `json:"op"`
}
type Syscall struct {
Value int `json:"value"`
Action Action `json:"action"`
Args []*Arg `json:"args"`
}
// TODO Windows. Many of these fields should be factored out into those parts
// which are common across platforms, and those which are platform specific.
// Config defines configuration options for executing a process inside a contained environment.
type Config struct {
// NoPivotRoot will use MS_MOVE and a chroot to jail the process into the container's rootfs
@@ -84,7 +119,7 @@ type Config struct {
// AdditionalGroups specifies the gids that should be added to supplementary groups
// in addition to those that the user belongs to.
AdditionalGroups []int `json:"additional_groups"`
AdditionalGroups []string `json:"additional_groups"`
// UidMappings is an array of User ID mappings for User Namespaces
UidMappings []IDMap `json:"uid_mappings"`
@@ -103,50 +138,9 @@ type Config struct {
// SystemProperties is a map of properties and their values. It is the equivalent of using
// sysctl -w my.property.name value in Linux.
SystemProperties map[string]string `json:"system_properties"`
}
// Gets the root uid for the process on host which could be non-zero
// when user namespaces are enabled.
func (c Config) HostUID() (int, error) {
if c.Namespaces.Contains(NEWUSER) {
if c.UidMappings == nil {
return -1, fmt.Errorf("User namespaces enabled, but no user mappings found.")
}
id, found := c.hostIDFromMapping(0, c.UidMappings)
if !found {
return -1, fmt.Errorf("User namespaces enabled, but no root user mapping found.")
}
return id, nil
}
// Return default root uid 0
return 0, nil
}
// Gets the root uid for the process on host which could be non-zero
// when user namespaces are enabled.
func (c Config) HostGID() (int, error) {
if c.Namespaces.Contains(NEWUSER) {
if c.GidMappings == nil {
return -1, fmt.Errorf("User namespaces enabled, but no gid mappings found.")
}
id, found := c.hostIDFromMapping(0, c.GidMappings)
if !found {
return -1, fmt.Errorf("User namespaces enabled, but no root user mapping found.")
}
return id, nil
}
// Return default root uid 0
return 0, nil
}
// Utility function that gets a host ID for a container ID from user namespace map
// if that ID is present in the map.
func (c Config) hostIDFromMapping(containerID int, uMap []IDMap) (int, bool) {
for _, m := range uMap {
if (containerID >= m.ContainerID) && (containerID <= (m.ContainerID + m.Size - 1)) {
hostID := m.HostID + (containerID - m.ContainerID)
return hostID, true
}
}
return -1, false
// Seccomp allows actions to be taken whenever a syscall is made within the container.
// By default, all syscalls are allowed with actions to allow, trap, kill, or return an errno
// can be specified on a per syscall basis.
Seccomp *Seccomp `json:"seccomp"`
}

View File

@@ -0,0 +1,51 @@
// +build freebsd linux
package configs
import "fmt"
// Gets the root uid for the process on host which could be non-zero
// when user namespaces are enabled.
func (c Config) HostUID() (int, error) {
if c.Namespaces.Contains(NEWUSER) {
if c.UidMappings == nil {
return -1, fmt.Errorf("User namespaces enabled, but no user mappings found.")
}
id, found := c.hostIDFromMapping(0, c.UidMappings)
if !found {
return -1, fmt.Errorf("User namespaces enabled, but no root user mapping found.")
}
return id, nil
}
// Return default root uid 0
return 0, nil
}
// Gets the root uid for the process on host which could be non-zero
// when user namespaces are enabled.
func (c Config) HostGID() (int, error) {
if c.Namespaces.Contains(NEWUSER) {
if c.GidMappings == nil {
return -1, fmt.Errorf("User namespaces enabled, but no gid mappings found.")
}
id, found := c.hostIDFromMapping(0, c.GidMappings)
if !found {
return -1, fmt.Errorf("User namespaces enabled, but no root user mapping found.")
}
return id, nil
}
// Return default root uid 0
return 0, nil
}
// Utility function that gets a host ID for a container ID from user namespace map
// if that ID is present in the map.
func (c Config) hostIDFromMapping(containerID int, uMap []IDMap) (int, bool) {
for _, m := range uMap {
if (containerID >= m.ContainerID) && (containerID <= (m.ContainerID + m.Size - 1)) {
hostID := m.HostID + (containerID - m.ContainerID)
return hostID, true
}
}
return -1, false
}

View File

@@ -9,6 +9,8 @@ const (
Wildcard = -1
)
// TODO Windows: This can be factored out in the future
type Device struct {
// Device type, block, char, etc.
Type rune `json:"type"`

View File

@@ -1,3 +1,5 @@
// +build linux freebsd
package configs
var (

View File

@@ -0,0 +1,9 @@
package configs
type HugepageLimit struct {
// which type of hugepage to limit.
Pagesize string `json:"page_size"`
// usage limit for hugepage.
Limit int `json:"limit"`
}

View File

@@ -0,0 +1,14 @@
package configs
import (
"fmt"
)
type IfPrioMap struct {
Interface string `json:"interface"`
Priority int64 `json:"priority"`
}
func (i *IfPrioMap) CgroupString() string {
return fmt.Sprintf("%s %d", i.Interface, i.Priority)
}

View File

@@ -1,91 +1,5 @@
package configs
import "fmt"
type NamespaceType string
const (
NEWNET NamespaceType = "NEWNET"
NEWPID NamespaceType = "NEWPID"
NEWNS NamespaceType = "NEWNS"
NEWUTS NamespaceType = "NEWUTS"
NEWIPC NamespaceType = "NEWIPC"
NEWUSER NamespaceType = "NEWUSER"
)
func NamespaceTypes() []NamespaceType {
return []NamespaceType{
NEWNET,
NEWPID,
NEWNS,
NEWUTS,
NEWIPC,
NEWUSER,
}
}
// Namespace defines configuration for each namespace. It specifies an
// alternate path that is able to be joined via setns.
type Namespace struct {
Type NamespaceType `json:"type"`
Path string `json:"path"`
}
func (n *Namespace) GetPath(pid int) string {
if n.Path != "" {
return n.Path
}
return fmt.Sprintf("/proc/%d/ns/%s", pid, n.file())
}
func (n *Namespace) file() string {
file := ""
switch n.Type {
case NEWNET:
file = "net"
case NEWNS:
file = "mnt"
case NEWPID:
file = "pid"
case NEWIPC:
file = "ipc"
case NEWUSER:
file = "user"
case NEWUTS:
file = "uts"
}
return file
}
type Namespaces []Namespace
func (n *Namespaces) Remove(t NamespaceType) bool {
i := n.index(t)
if i == -1 {
return false
}
*n = append((*n)[:i], (*n)[i+1:]...)
return true
}
func (n *Namespaces) Add(t NamespaceType, path string) {
i := n.index(t)
if i == -1 {
*n = append(*n, Namespace{Type: t, Path: path})
return
}
(*n)[i].Path = path
}
func (n *Namespaces) index(t NamespaceType) int {
for i, ns := range *n {
if ns.Type == t {
return i
}
}
return -1
}
func (n *Namespaces) Contains(t NamespaceType) bool {
return n.index(t) != -1
}

View File

@@ -1,4 +1,4 @@
// +build !linux
// +build !linux,!windows
package configs

View File

@@ -0,0 +1,89 @@
// +build linux freebsd
package configs
import "fmt"
const (
NEWNET NamespaceType = "NEWNET"
NEWPID NamespaceType = "NEWPID"
NEWNS NamespaceType = "NEWNS"
NEWUTS NamespaceType = "NEWUTS"
NEWIPC NamespaceType = "NEWIPC"
NEWUSER NamespaceType = "NEWUSER"
)
func NamespaceTypes() []NamespaceType {
return []NamespaceType{
NEWNET,
NEWPID,
NEWNS,
NEWUTS,
NEWIPC,
NEWUSER,
}
}
// Namespace defines configuration for each namespace. It specifies an
// alternate path that is able to be joined via setns.
type Namespace struct {
Type NamespaceType `json:"type"`
Path string `json:"path"`
}
func (n *Namespace) GetPath(pid int) string {
if n.Path != "" {
return n.Path
}
return fmt.Sprintf("/proc/%d/ns/%s", pid, n.file())
}
func (n *Namespace) file() string {
file := ""
switch n.Type {
case NEWNET:
file = "net"
case NEWNS:
file = "mnt"
case NEWPID:
file = "pid"
case NEWIPC:
file = "ipc"
case NEWUSER:
file = "user"
case NEWUTS:
file = "uts"
}
return file
}
func (n *Namespaces) Remove(t NamespaceType) bool {
i := n.index(t)
if i == -1 {
return false
}
*n = append((*n)[:i], (*n)[i+1:]...)
return true
}
func (n *Namespaces) Add(t NamespaceType, path string) {
i := n.index(t)
if i == -1 {
*n = append(*n, Namespace{Type: t, Path: path})
return
}
(*n)[i].Path = path
}
func (n *Namespaces) index(t NamespaceType) int {
for i, ns := range *n {
if ns.Type == t {
return i
}
}
return -1
}
func (n *Namespaces) Contains(t NamespaceType) bool {
return n.index(t) != -1
}

View File

@@ -0,0 +1,6 @@
package configs
// Namespace defines configuration for each namespace. It specifies an
// alternate path that is able to be joined via setns.
type Namespace struct {
}

View File

@@ -0,0 +1,13 @@
// +build freebsd
package libcontainer
import (
"errors"
)
// newConsole returns an initalized console that can be used within a container by copying bytes
// from the master side to the slave that is attached as the tty for the container's init process.
func newConsole(uid, gid int) (Console, error) {
return nil, errors.New("libcontainer console is not supported on FreeBSD")
}

View File

@@ -1,5 +1,3 @@
// +build linux
package libcontainer
import (
@@ -94,7 +92,7 @@ func (c *linuxConsole) mount(rootfs, mountLabel string, uid, gid int) error {
return syscall.Mount(c.slavePath, dest, "bind", syscall.MS_BIND, "")
}
// dupStdio opens the slavePath for the console and dup2s the fds to the current
// dupStdio opens the slavePath for the console and dups the fds to the current
// processes stdio, fd 0,1,2.
func (c *linuxConsole) dupStdio() error {
slave, err := c.open(syscall.O_RDWR)
@@ -103,7 +101,7 @@ func (c *linuxConsole) dupStdio() error {
}
fd := int(slave.Fd())
for _, i := range []int{0, 1, 2} {
if err := syscall.Dup2(fd, i); err != nil {
if err := syscall.Dup3(fd, i, 0); err != nil {
return err
}
}

View File

@@ -0,0 +1,30 @@
package libcontainer
// newConsole returns an initalized console that can be used within a container
func newConsole(uid, gid int) (Console, error) {
return &windowsConsole{}, nil
}
// windowsConsole is a Windows psuedo TTY for use within a container.
type windowsConsole struct {
}
func (c *windowsConsole) Fd() uintptr {
return 0
}
func (c *windowsConsole) Path() string {
return ""
}
func (c *windowsConsole) Read(b []byte) (int, error) {
return 0, nil
}
func (c *windowsConsole) Write(b []byte) (int, error) {
return 0, nil
}
func (c *windowsConsole) Close() error {
return nil
}

View File

@@ -21,6 +21,9 @@ const (
// The container exists, but all its processes are paused.
Paused
// The container exists, but its state is saved on disk
Checkpointed
// The container does not exist.
Destroyed
)
@@ -46,6 +49,9 @@ type State struct {
// Config is the container's configuration.
Config configs.Config `json:"config"`
// Container's standard descriptors (std{in,out,err}), needed for checkpoint and restore
ExternalDescriptors []string `json:"external_descriptors,omitempty"`
}
// A libcontainer container object.
@@ -108,6 +114,18 @@ type Container interface {
// Systemerror - System error.
Start(process *Process) (err error)
// Checkpoint checkpoints the running container's state to disk using the criu(8) utility.
//
// errors:
// Systemerror - System error.
Checkpoint(criuOpts *CriuOpts) error
// Restore restores the checkpointed container to a running state using the criu(8) utiity.
//
// errors:
// Systemerror - System error.
Restore(process *Process, criuOpts *CriuOpts) error
// Destroys the container after killing all running processes.
//
// Any event registrations are removed before the container is destroyed.

View File

@@ -5,15 +5,19 @@ package libcontainer
import (
"encoding/json"
"fmt"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"strings"
"sync"
"syscall"
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
"github.com/docker/libcontainer/criurpc"
"github.com/golang/protobuf/proto"
)
const stdioFdCount = 3
@@ -26,6 +30,7 @@ type linuxContainer struct {
initPath string
initArgs []string
initProcess parentProcess
criuPath string
m sync.Mutex
}
@@ -102,13 +107,12 @@ func (c *linuxContainer) Start(process *Process) error {
if err := parent.start(); err != nil {
// terminate the process to ensure that it properly is reaped.
if err := parent.terminate(); err != nil {
log.Warn(err)
logrus.Warn(err)
}
return newSystemError(err)
}
process.ops = parent
if doInit {
c.updateState(parent)
}
return nil
@@ -227,7 +231,7 @@ func (c *linuxContainer) Destroy() error {
}
if !c.config.Namespaces.Contains(configs.NEWPID) {
if err := killCgroupProcesses(c.cgroupManager); err != nil {
log.Warn(err)
logrus.Warn(err)
}
}
err = c.cgroupManager.Destroy()
@@ -254,6 +258,458 @@ func (c *linuxContainer) NotifyOOM() (<-chan struct{}, error) {
return notifyOnOOM(c.cgroupManager.GetPaths())
}
// XXX debug support, remove when debugging done.
func addArgsFromEnv(evar string, args *[]string) {
if e := os.Getenv(evar); e != "" {
for _, f := range strings.Fields(e) {
*args = append(*args, f)
}
}
fmt.Printf(">>> criu %v\n", *args)
}
func (c *linuxContainer) checkCriuVersion() error {
var x, y, z int
out, err := exec.Command(c.criuPath, "-V").Output()
if err != nil {
return err
}
n, err := fmt.Sscanf(string(out), "Version: %d.%d.%d\n", &x, &y, &z) // 1.5.2
if err != nil {
n, err = fmt.Sscanf(string(out), "Version: %d.%d\n", &x, &y) // 1.6
}
if n < 2 || err != nil {
return fmt.Errorf("Unable to parse the CRIU version: %s %d %s", out, n, err)
}
if x*10000+y*100+z < 10502 {
return fmt.Errorf("CRIU version must be 1.5.2 or higher")
}
return nil
}
const descriptors_filename = "descriptors.json"
func (c *linuxContainer) Checkpoint(criuOpts *CriuOpts) error {
c.m.Lock()
defer c.m.Unlock()
if err := c.checkCriuVersion(); err != nil {
return err
}
if criuOpts.ImagesDirectory == "" {
criuOpts.ImagesDirectory = filepath.Join(c.root, "criu.image")
}
// Since a container can be C/R'ed multiple times,
// the checkpoint directory may already exist.
if err := os.Mkdir(criuOpts.ImagesDirectory, 0755); err != nil && !os.IsExist(err) {
return err
}
if criuOpts.WorkDirectory == "" {
criuOpts.WorkDirectory = filepath.Join(c.root, "criu.work")
}
if err := os.Mkdir(criuOpts.WorkDirectory, 0755); err != nil && !os.IsExist(err) {
return err
}
workDir, err := os.Open(criuOpts.WorkDirectory)
if err != nil {
return err
}
defer workDir.Close()
imageDir, err := os.Open(criuOpts.ImagesDirectory)
if err != nil {
return err
}
defer imageDir.Close()
rpcOpts := criurpc.CriuOpts{
ImagesDirFd: proto.Int32(int32(imageDir.Fd())),
WorkDirFd: proto.Int32(int32(workDir.Fd())),
LogLevel: proto.Int32(4),
LogFile: proto.String("dump.log"),
Root: proto.String(c.config.Rootfs),
ManageCgroups: proto.Bool(true),
NotifyScripts: proto.Bool(true),
Pid: proto.Int32(int32(c.initProcess.pid())),
ShellJob: proto.Bool(criuOpts.ShellJob),
LeaveRunning: proto.Bool(criuOpts.LeaveRunning),
TcpEstablished: proto.Bool(criuOpts.TcpEstablished),
ExtUnixSk: proto.Bool(criuOpts.ExternalUnixConnections),
}
// append optional criu opts, e.g., page-server and port
if criuOpts.PageServer.Address != "" && criuOpts.PageServer.Port != 0 {
rpcOpts.Ps = &criurpc.CriuPageServerInfo{
Address: proto.String(criuOpts.PageServer.Address),
Port: proto.Int32(criuOpts.PageServer.Port),
}
}
t := criurpc.CriuReqType_DUMP
req := criurpc.CriuReq{
Type: &t,
Opts: &rpcOpts,
}
for _, m := range c.config.Mounts {
if m.Device == "bind" {
mountDest := m.Destination
if strings.HasPrefix(mountDest, c.config.Rootfs) {
mountDest = mountDest[len(c.config.Rootfs):]
}
extMnt := new(criurpc.ExtMountMap)
extMnt.Key = proto.String(mountDest)
extMnt.Val = proto.String(mountDest)
req.Opts.ExtMnt = append(req.Opts.ExtMnt, extMnt)
}
}
// Write the FD info to a file in the image directory
fdsJSON, err := json.Marshal(c.initProcess.externalDescriptors())
if err != nil {
return err
}
err = ioutil.WriteFile(filepath.Join(criuOpts.ImagesDirectory, descriptors_filename), fdsJSON, 0655)
if err != nil {
return err
}
err = c.criuSwrk(nil, &req, criuOpts)
if err != nil {
return err
}
return nil
}
func (c *linuxContainer) Restore(process *Process, criuOpts *CriuOpts) error {
c.m.Lock()
defer c.m.Unlock()
if err := c.checkCriuVersion(); err != nil {
return err
}
if criuOpts.WorkDirectory == "" {
criuOpts.WorkDirectory = filepath.Join(c.root, "criu.work")
}
// Since a container can be C/R'ed multiple times,
// the work directory may already exist.
if err := os.Mkdir(criuOpts.WorkDirectory, 0655); err != nil && !os.IsExist(err) {
return err
}
workDir, err := os.Open(criuOpts.WorkDirectory)
if err != nil {
return err
}
defer workDir.Close()
if criuOpts.ImagesDirectory == "" {
criuOpts.ImagesDirectory = filepath.Join(c.root, "criu.image")
}
imageDir, err := os.Open(criuOpts.ImagesDirectory)
if err != nil {
return err
}
defer imageDir.Close()
// CRIU has a few requirements for a root directory:
// * it must be a mount point
// * its parent must not be overmounted
// c.config.Rootfs is bind-mounted to a temporary directory
// to satisfy these requirements.
root := filepath.Join(c.root, "criu-root")
if err := os.Mkdir(root, 0755); err != nil {
return err
}
defer os.Remove(root)
root, err = filepath.EvalSymlinks(root)
if err != nil {
return err
}
err = syscall.Mount(c.config.Rootfs, root, "", syscall.MS_BIND|syscall.MS_REC, "")
if err != nil {
return err
}
defer syscall.Unmount(root, syscall.MNT_DETACH)
t := criurpc.CriuReqType_RESTORE
req := criurpc.CriuReq{
Type: &t,
Opts: &criurpc.CriuOpts{
ImagesDirFd: proto.Int32(int32(imageDir.Fd())),
WorkDirFd: proto.Int32(int32(workDir.Fd())),
EvasiveDevices: proto.Bool(true),
LogLevel: proto.Int32(4),
LogFile: proto.String("restore.log"),
RstSibling: proto.Bool(true),
Root: proto.String(root),
ManageCgroups: proto.Bool(true),
NotifyScripts: proto.Bool(true),
ShellJob: proto.Bool(criuOpts.ShellJob),
ExtUnixSk: proto.Bool(criuOpts.ExternalUnixConnections),
TcpEstablished: proto.Bool(criuOpts.TcpEstablished),
},
}
for _, m := range c.config.Mounts {
if m.Device == "bind" {
mountDest := m.Destination
if strings.HasPrefix(mountDest, c.config.Rootfs) {
mountDest = mountDest[len(c.config.Rootfs):]
}
extMnt := new(criurpc.ExtMountMap)
extMnt.Key = proto.String(mountDest)
extMnt.Val = proto.String(m.Source)
req.Opts.ExtMnt = append(req.Opts.ExtMnt, extMnt)
}
}
for _, iface := range c.config.Networks {
switch iface.Type {
case "veth":
veth := new(criurpc.CriuVethPair)
veth.IfOut = proto.String(iface.HostInterfaceName)
veth.IfIn = proto.String(iface.Name)
req.Opts.Veths = append(req.Opts.Veths, veth)
break
case "loopback":
break
}
}
var (
fds []string
fdJSON []byte
)
if fdJSON, err = ioutil.ReadFile(filepath.Join(criuOpts.ImagesDirectory, descriptors_filename)); err != nil {
return err
}
if err = json.Unmarshal(fdJSON, &fds); err != nil {
return err
}
for i := range fds {
if s := fds[i]; strings.Contains(s, "pipe:") {
inheritFd := new(criurpc.InheritFd)
inheritFd.Key = proto.String(s)
inheritFd.Fd = proto.Int32(int32(i))
req.Opts.InheritFd = append(req.Opts.InheritFd, inheritFd)
}
}
err = c.criuSwrk(process, &req, criuOpts)
if err != nil {
return err
}
return nil
}
func (c *linuxContainer) criuSwrk(process *Process, req *criurpc.CriuReq, opts *CriuOpts) error {
fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_SEQPACKET|syscall.SOCK_CLOEXEC, 0)
if err != nil {
return err
}
criuClient := os.NewFile(uintptr(fds[0]), "criu-transport-client")
criuServer := os.NewFile(uintptr(fds[1]), "criu-transport-server")
defer criuClient.Close()
defer criuServer.Close()
args := []string{"swrk", "3"}
cmd := exec.Command(c.criuPath, args...)
if process != nil {
cmd.Stdin = process.Stdin
cmd.Stdout = process.Stdout
cmd.Stderr = process.Stderr
}
cmd.ExtraFiles = append(cmd.ExtraFiles, criuServer)
if err := cmd.Start(); err != nil {
return err
}
criuServer.Close()
defer func() {
criuClient.Close()
_, err := cmd.Process.Wait()
if err != nil {
return
}
}()
var extFds []string
if process != nil {
extFds, err = getPipeFds(cmd.Process.Pid)
if err != nil {
return err
}
}
data, err := proto.Marshal(req)
if err != nil {
return err
}
_, err = criuClient.Write(data)
if err != nil {
return err
}
buf := make([]byte, 10*4096)
for true {
n, err := criuClient.Read(buf)
if err != nil {
return err
}
if n == 0 {
return fmt.Errorf("unexpected EOF")
}
if n == len(buf) {
return fmt.Errorf("buffer is too small")
}
resp := new(criurpc.CriuResp)
err = proto.Unmarshal(buf[:n], resp)
if err != nil {
return err
}
if !resp.GetSuccess() {
return fmt.Errorf("criu failed: type %s errno %d", req.GetType().String(), resp.GetCrErrno())
}
t := resp.GetType()
switch {
case t == criurpc.CriuReqType_NOTIFY:
if err := c.criuNotifications(resp, process, opts, extFds); err != nil {
return err
}
t = criurpc.CriuReqType_NOTIFY
req = &criurpc.CriuReq{
Type: &t,
NotifySuccess: proto.Bool(true),
}
data, err = proto.Marshal(req)
if err != nil {
return err
}
n, err = criuClient.Write(data)
if err != nil {
return err
}
continue
case t == criurpc.CriuReqType_RESTORE:
case t == criurpc.CriuReqType_DUMP:
break
default:
return fmt.Errorf("unable to parse the response %s", resp.String())
}
break
}
// cmd.Wait() waits cmd.goroutines which are used for proxying file descriptors.
// Here we want to wait only the CRIU process.
st, err := cmd.Process.Wait()
if err != nil {
return err
}
if !st.Success() {
return fmt.Errorf("criu failed: %s", st.String())
}
return nil
}
// block any external network activity
func lockNetwork(config *configs.Config) error {
for _, config := range config.Networks {
strategy, err := getStrategy(config.Type)
if err != nil {
return err
}
if err := strategy.detach(config); err != nil {
return err
}
}
return nil
}
func unlockNetwork(config *configs.Config) error {
for _, config := range config.Networks {
strategy, err := getStrategy(config.Type)
if err != nil {
return err
}
if err = strategy.attach(config); err != nil {
return err
}
}
return nil
}
func (c *linuxContainer) criuNotifications(resp *criurpc.CriuResp, process *Process, opts *CriuOpts, fds []string) error {
notify := resp.GetNotify()
if notify == nil {
return fmt.Errorf("invalid response: %s", resp.String())
}
switch {
case notify.GetScript() == "post-dump":
if !opts.LeaveRunning {
f, err := os.Create(filepath.Join(c.root, "checkpoint"))
if err != nil {
return err
}
f.Close()
}
break
case notify.GetScript() == "network-unlock":
if err := unlockNetwork(c.config); err != nil {
return err
}
break
case notify.GetScript() == "network-lock":
if err := lockNetwork(c.config); err != nil {
return err
}
break
case notify.GetScript() == "post-restore":
pid := notify.GetPid()
r, err := newRestoredProcess(int(pid), fds)
if err != nil {
return err
}
// TODO: crosbymichael restore previous process information by saving the init process information in
// the container's state file or separate process state files.
if err := c.updateState(r); err != nil {
return err
}
process.ops = r
break
}
return nil
}
func (c *linuxContainer) updateState(process parentProcess) error {
c.initProcess = process
state, err := c.currentState()
@@ -265,10 +721,14 @@ func (c *linuxContainer) updateState(process parentProcess) error {
return err
}
defer f.Close()
os.Remove(filepath.Join(c.root, "checkpoint"))
return json.NewEncoder(f).Encode(state)
}
func (c *linuxContainer) currentStatus() (Status, error) {
if _, err := os.Stat(filepath.Join(c.root, "checkpoint")); err == nil {
return Checkpointed, nil
}
if c.initProcess == nil {
return Destroyed, nil
}
@@ -304,6 +764,7 @@ func (c *linuxContainer) currentState() (*State, error) {
InitProcessStartTime: startTime,
CgroupPaths: c.cgroupManager.GetPaths(),
NamespacePaths: make(map[configs.NamespaceType]string),
ExternalDescriptors: c.initProcess.externalDescriptors(),
}
for _, ns := range c.config.Namespaces {
state.NamespacePaths[ns.Type] = ns.GetPath(c.initProcess.pid())

View File

@@ -0,0 +1,16 @@
package libcontainer
type CriuPageServerInfo struct {
Address string // IP address of CRIU page server
Port int32 // port number of CRIU page server
}
type CriuOpts struct {
ImagesDirectory string // directory for storing image files
WorkDirectory string // directory to cd and write logs/pidfiles/stats to
LeaveRunning bool // leave container in running state after checkpoint
TcpEstablished bool // checkpoint/restore established TCP connections
ExternalUnixConnections bool // allow external unix connections
ShellJob bool // allow to dump and restore shell jobs
PageServer CriuPageServerInfo // allow to dump to criu page server
}

View File

@@ -0,0 +1,2 @@
gen: criurpc.proto
protoc --go_out=. criurpc.proto

View File

@@ -0,0 +1,602 @@
// Code generated by protoc-gen-go.
// source: criurpc.proto
// DO NOT EDIT!
package criurpc
import proto "github.com/golang/protobuf/proto"
import json "encoding/json"
import math "math"
// Reference proto, json, and math imports to suppress error if they are not otherwise used.
var _ = proto.Marshal
var _ = &json.SyntaxError{}
var _ = math.Inf
type CriuReqType int32
const (
CriuReqType_EMPTY CriuReqType = 0
CriuReqType_DUMP CriuReqType = 1
CriuReqType_RESTORE CriuReqType = 2
CriuReqType_CHECK CriuReqType = 3
CriuReqType_PRE_DUMP CriuReqType = 4
CriuReqType_PAGE_SERVER CriuReqType = 5
CriuReqType_NOTIFY CriuReqType = 6
CriuReqType_CPUINFO_DUMP CriuReqType = 7
CriuReqType_CPUINFO_CHECK CriuReqType = 8
)
var CriuReqType_name = map[int32]string{
0: "EMPTY",
1: "DUMP",
2: "RESTORE",
3: "CHECK",
4: "PRE_DUMP",
5: "PAGE_SERVER",
6: "NOTIFY",
7: "CPUINFO_DUMP",
8: "CPUINFO_CHECK",
}
var CriuReqType_value = map[string]int32{
"EMPTY": 0,
"DUMP": 1,
"RESTORE": 2,
"CHECK": 3,
"PRE_DUMP": 4,
"PAGE_SERVER": 5,
"NOTIFY": 6,
"CPUINFO_DUMP": 7,
"CPUINFO_CHECK": 8,
}
func (x CriuReqType) Enum() *CriuReqType {
p := new(CriuReqType)
*p = x
return p
}
func (x CriuReqType) String() string {
return proto.EnumName(CriuReqType_name, int32(x))
}
func (x CriuReqType) MarshalJSON() ([]byte, error) {
return json.Marshal(x.String())
}
func (x *CriuReqType) UnmarshalJSON(data []byte) error {
value, err := proto.UnmarshalJSONEnum(CriuReqType_value, data, "CriuReqType")
if err != nil {
return err
}
*x = CriuReqType(value)
return nil
}
type CriuPageServerInfo struct {
Address *string `protobuf:"bytes,1,opt,name=address" json:"address,omitempty"`
Port *int32 `protobuf:"varint,2,opt,name=port" json:"port,omitempty"`
Pid *int32 `protobuf:"varint,3,opt,name=pid" json:"pid,omitempty"`
Fd *int32 `protobuf:"varint,4,opt,name=fd" json:"fd,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuPageServerInfo) Reset() { *m = CriuPageServerInfo{} }
func (m *CriuPageServerInfo) String() string { return proto.CompactTextString(m) }
func (*CriuPageServerInfo) ProtoMessage() {}
func (m *CriuPageServerInfo) GetAddress() string {
if m != nil && m.Address != nil {
return *m.Address
}
return ""
}
func (m *CriuPageServerInfo) GetPort() int32 {
if m != nil && m.Port != nil {
return *m.Port
}
return 0
}
func (m *CriuPageServerInfo) GetPid() int32 {
if m != nil && m.Pid != nil {
return *m.Pid
}
return 0
}
func (m *CriuPageServerInfo) GetFd() int32 {
if m != nil && m.Fd != nil {
return *m.Fd
}
return 0
}
type CriuVethPair struct {
IfIn *string `protobuf:"bytes,1,req,name=if_in" json:"if_in,omitempty"`
IfOut *string `protobuf:"bytes,2,req,name=if_out" json:"if_out,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuVethPair) Reset() { *m = CriuVethPair{} }
func (m *CriuVethPair) String() string { return proto.CompactTextString(m) }
func (*CriuVethPair) ProtoMessage() {}
func (m *CriuVethPair) GetIfIn() string {
if m != nil && m.IfIn != nil {
return *m.IfIn
}
return ""
}
func (m *CriuVethPair) GetIfOut() string {
if m != nil && m.IfOut != nil {
return *m.IfOut
}
return ""
}
type ExtMountMap struct {
Key *string `protobuf:"bytes,1,req,name=key" json:"key,omitempty"`
Val *string `protobuf:"bytes,2,req,name=val" json:"val,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *ExtMountMap) Reset() { *m = ExtMountMap{} }
func (m *ExtMountMap) String() string { return proto.CompactTextString(m) }
func (*ExtMountMap) ProtoMessage() {}
func (m *ExtMountMap) GetKey() string {
if m != nil && m.Key != nil {
return *m.Key
}
return ""
}
func (m *ExtMountMap) GetVal() string {
if m != nil && m.Val != nil {
return *m.Val
}
return ""
}
type InheritFd struct {
Key *string `protobuf:"bytes,1,req,name=key" json:"key,omitempty"`
Fd *int32 `protobuf:"varint,2,req,name=fd" json:"fd,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *InheritFd) Reset() { *m = InheritFd{} }
func (m *InheritFd) String() string { return proto.CompactTextString(m) }
func (*InheritFd) ProtoMessage() {}
func (m *InheritFd) GetKey() string {
if m != nil && m.Key != nil {
return *m.Key
}
return ""
}
func (m *InheritFd) GetFd() int32 {
if m != nil && m.Fd != nil {
return *m.Fd
}
return 0
}
type CgroupRoot struct {
Ctrl *string `protobuf:"bytes,1,opt,name=ctrl" json:"ctrl,omitempty"`
Path *string `protobuf:"bytes,2,req,name=path" json:"path,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CgroupRoot) Reset() { *m = CgroupRoot{} }
func (m *CgroupRoot) String() string { return proto.CompactTextString(m) }
func (*CgroupRoot) ProtoMessage() {}
func (m *CgroupRoot) GetCtrl() string {
if m != nil && m.Ctrl != nil {
return *m.Ctrl
}
return ""
}
func (m *CgroupRoot) GetPath() string {
if m != nil && m.Path != nil {
return *m.Path
}
return ""
}
type CriuOpts struct {
ImagesDirFd *int32 `protobuf:"varint,1,req,name=images_dir_fd" json:"images_dir_fd,omitempty"`
Pid *int32 `protobuf:"varint,2,opt,name=pid" json:"pid,omitempty"`
LeaveRunning *bool `protobuf:"varint,3,opt,name=leave_running" json:"leave_running,omitempty"`
ExtUnixSk *bool `protobuf:"varint,4,opt,name=ext_unix_sk" json:"ext_unix_sk,omitempty"`
TcpEstablished *bool `protobuf:"varint,5,opt,name=tcp_established" json:"tcp_established,omitempty"`
EvasiveDevices *bool `protobuf:"varint,6,opt,name=evasive_devices" json:"evasive_devices,omitempty"`
ShellJob *bool `protobuf:"varint,7,opt,name=shell_job" json:"shell_job,omitempty"`
FileLocks *bool `protobuf:"varint,8,opt,name=file_locks" json:"file_locks,omitempty"`
LogLevel *int32 `protobuf:"varint,9,opt,name=log_level,def=2" json:"log_level,omitempty"`
LogFile *string `protobuf:"bytes,10,opt,name=log_file" json:"log_file,omitempty"`
Ps *CriuPageServerInfo `protobuf:"bytes,11,opt,name=ps" json:"ps,omitempty"`
NotifyScripts *bool `protobuf:"varint,12,opt,name=notify_scripts" json:"notify_scripts,omitempty"`
Root *string `protobuf:"bytes,13,opt,name=root" json:"root,omitempty"`
ParentImg *string `protobuf:"bytes,14,opt,name=parent_img" json:"parent_img,omitempty"`
TrackMem *bool `protobuf:"varint,15,opt,name=track_mem" json:"track_mem,omitempty"`
AutoDedup *bool `protobuf:"varint,16,opt,name=auto_dedup" json:"auto_dedup,omitempty"`
WorkDirFd *int32 `protobuf:"varint,17,opt,name=work_dir_fd" json:"work_dir_fd,omitempty"`
LinkRemap *bool `protobuf:"varint,18,opt,name=link_remap" json:"link_remap,omitempty"`
Veths []*CriuVethPair `protobuf:"bytes,19,rep,name=veths" json:"veths,omitempty"`
CpuCap *uint32 `protobuf:"varint,20,opt,name=cpu_cap,def=4294967295" json:"cpu_cap,omitempty"`
ForceIrmap *bool `protobuf:"varint,21,opt,name=force_irmap" json:"force_irmap,omitempty"`
ExecCmd []string `protobuf:"bytes,22,rep,name=exec_cmd" json:"exec_cmd,omitempty"`
ExtMnt []*ExtMountMap `protobuf:"bytes,23,rep,name=ext_mnt" json:"ext_mnt,omitempty"`
ManageCgroups *bool `protobuf:"varint,24,opt,name=manage_cgroups" json:"manage_cgroups,omitempty"`
CgRoot []*CgroupRoot `protobuf:"bytes,25,rep,name=cg_root" json:"cg_root,omitempty"`
RstSibling *bool `protobuf:"varint,26,opt,name=rst_sibling" json:"rst_sibling,omitempty"`
InheritFd []*InheritFd `protobuf:"bytes,27,rep,name=inherit_fd" json:"inherit_fd,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuOpts) Reset() { *m = CriuOpts{} }
func (m *CriuOpts) String() string { return proto.CompactTextString(m) }
func (*CriuOpts) ProtoMessage() {}
const Default_CriuOpts_LogLevel int32 = 2
const Default_CriuOpts_CpuCap uint32 = 4294967295
func (m *CriuOpts) GetImagesDirFd() int32 {
if m != nil && m.ImagesDirFd != nil {
return *m.ImagesDirFd
}
return 0
}
func (m *CriuOpts) GetPid() int32 {
if m != nil && m.Pid != nil {
return *m.Pid
}
return 0
}
func (m *CriuOpts) GetLeaveRunning() bool {
if m != nil && m.LeaveRunning != nil {
return *m.LeaveRunning
}
return false
}
func (m *CriuOpts) GetExtUnixSk() bool {
if m != nil && m.ExtUnixSk != nil {
return *m.ExtUnixSk
}
return false
}
func (m *CriuOpts) GetTcpEstablished() bool {
if m != nil && m.TcpEstablished != nil {
return *m.TcpEstablished
}
return false
}
func (m *CriuOpts) GetEvasiveDevices() bool {
if m != nil && m.EvasiveDevices != nil {
return *m.EvasiveDevices
}
return false
}
func (m *CriuOpts) GetShellJob() bool {
if m != nil && m.ShellJob != nil {
return *m.ShellJob
}
return false
}
func (m *CriuOpts) GetFileLocks() bool {
if m != nil && m.FileLocks != nil {
return *m.FileLocks
}
return false
}
func (m *CriuOpts) GetLogLevel() int32 {
if m != nil && m.LogLevel != nil {
return *m.LogLevel
}
return Default_CriuOpts_LogLevel
}
func (m *CriuOpts) GetLogFile() string {
if m != nil && m.LogFile != nil {
return *m.LogFile
}
return ""
}
func (m *CriuOpts) GetPs() *CriuPageServerInfo {
if m != nil {
return m.Ps
}
return nil
}
func (m *CriuOpts) GetNotifyScripts() bool {
if m != nil && m.NotifyScripts != nil {
return *m.NotifyScripts
}
return false
}
func (m *CriuOpts) GetRoot() string {
if m != nil && m.Root != nil {
return *m.Root
}
return ""
}
func (m *CriuOpts) GetParentImg() string {
if m != nil && m.ParentImg != nil {
return *m.ParentImg
}
return ""
}
func (m *CriuOpts) GetTrackMem() bool {
if m != nil && m.TrackMem != nil {
return *m.TrackMem
}
return false
}
func (m *CriuOpts) GetAutoDedup() bool {
if m != nil && m.AutoDedup != nil {
return *m.AutoDedup
}
return false
}
func (m *CriuOpts) GetWorkDirFd() int32 {
if m != nil && m.WorkDirFd != nil {
return *m.WorkDirFd
}
return 0
}
func (m *CriuOpts) GetLinkRemap() bool {
if m != nil && m.LinkRemap != nil {
return *m.LinkRemap
}
return false
}
func (m *CriuOpts) GetVeths() []*CriuVethPair {
if m != nil {
return m.Veths
}
return nil
}
func (m *CriuOpts) GetCpuCap() uint32 {
if m != nil && m.CpuCap != nil {
return *m.CpuCap
}
return Default_CriuOpts_CpuCap
}
func (m *CriuOpts) GetForceIrmap() bool {
if m != nil && m.ForceIrmap != nil {
return *m.ForceIrmap
}
return false
}
func (m *CriuOpts) GetExecCmd() []string {
if m != nil {
return m.ExecCmd
}
return nil
}
func (m *CriuOpts) GetExtMnt() []*ExtMountMap {
if m != nil {
return m.ExtMnt
}
return nil
}
func (m *CriuOpts) GetManageCgroups() bool {
if m != nil && m.ManageCgroups != nil {
return *m.ManageCgroups
}
return false
}
func (m *CriuOpts) GetCgRoot() []*CgroupRoot {
if m != nil {
return m.CgRoot
}
return nil
}
func (m *CriuOpts) GetRstSibling() bool {
if m != nil && m.RstSibling != nil {
return *m.RstSibling
}
return false
}
func (m *CriuOpts) GetInheritFd() []*InheritFd {
if m != nil {
return m.InheritFd
}
return nil
}
type CriuDumpResp struct {
Restored *bool `protobuf:"varint,1,opt,name=restored" json:"restored,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuDumpResp) Reset() { *m = CriuDumpResp{} }
func (m *CriuDumpResp) String() string { return proto.CompactTextString(m) }
func (*CriuDumpResp) ProtoMessage() {}
func (m *CriuDumpResp) GetRestored() bool {
if m != nil && m.Restored != nil {
return *m.Restored
}
return false
}
type CriuRestoreResp struct {
Pid *int32 `protobuf:"varint,1,req,name=pid" json:"pid,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuRestoreResp) Reset() { *m = CriuRestoreResp{} }
func (m *CriuRestoreResp) String() string { return proto.CompactTextString(m) }
func (*CriuRestoreResp) ProtoMessage() {}
func (m *CriuRestoreResp) GetPid() int32 {
if m != nil && m.Pid != nil {
return *m.Pid
}
return 0
}
type CriuNotify struct {
Script *string `protobuf:"bytes,1,opt,name=script" json:"script,omitempty"`
Pid *int32 `protobuf:"varint,2,opt,name=pid" json:"pid,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuNotify) Reset() { *m = CriuNotify{} }
func (m *CriuNotify) String() string { return proto.CompactTextString(m) }
func (*CriuNotify) ProtoMessage() {}
func (m *CriuNotify) GetScript() string {
if m != nil && m.Script != nil {
return *m.Script
}
return ""
}
func (m *CriuNotify) GetPid() int32 {
if m != nil && m.Pid != nil {
return *m.Pid
}
return 0
}
type CriuReq struct {
Type *CriuReqType `protobuf:"varint,1,req,name=type,enum=CriuReqType" json:"type,omitempty"`
Opts *CriuOpts `protobuf:"bytes,2,opt,name=opts" json:"opts,omitempty"`
NotifySuccess *bool `protobuf:"varint,3,opt,name=notify_success" json:"notify_success,omitempty"`
//
// When set service won't close the connection but
// will wait for more req-s to appear. Works not
// for all request types.
KeepOpen *bool `protobuf:"varint,4,opt,name=keep_open" json:"keep_open,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuReq) Reset() { *m = CriuReq{} }
func (m *CriuReq) String() string { return proto.CompactTextString(m) }
func (*CriuReq) ProtoMessage() {}
func (m *CriuReq) GetType() CriuReqType {
if m != nil && m.Type != nil {
return *m.Type
}
return 0
}
func (m *CriuReq) GetOpts() *CriuOpts {
if m != nil {
return m.Opts
}
return nil
}
func (m *CriuReq) GetNotifySuccess() bool {
if m != nil && m.NotifySuccess != nil {
return *m.NotifySuccess
}
return false
}
func (m *CriuReq) GetKeepOpen() bool {
if m != nil && m.KeepOpen != nil {
return *m.KeepOpen
}
return false
}
type CriuResp struct {
Type *CriuReqType `protobuf:"varint,1,req,name=type,enum=CriuReqType" json:"type,omitempty"`
Success *bool `protobuf:"varint,2,req,name=success" json:"success,omitempty"`
Dump *CriuDumpResp `protobuf:"bytes,3,opt,name=dump" json:"dump,omitempty"`
Restore *CriuRestoreResp `protobuf:"bytes,4,opt,name=restore" json:"restore,omitempty"`
Notify *CriuNotify `protobuf:"bytes,5,opt,name=notify" json:"notify,omitempty"`
Ps *CriuPageServerInfo `protobuf:"bytes,6,opt,name=ps" json:"ps,omitempty"`
CrErrno *int32 `protobuf:"varint,7,opt,name=cr_errno" json:"cr_errno,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *CriuResp) Reset() { *m = CriuResp{} }
func (m *CriuResp) String() string { return proto.CompactTextString(m) }
func (*CriuResp) ProtoMessage() {}
func (m *CriuResp) GetType() CriuReqType {
if m != nil && m.Type != nil {
return *m.Type
}
return 0
}
func (m *CriuResp) GetSuccess() bool {
if m != nil && m.Success != nil {
return *m.Success
}
return false
}
func (m *CriuResp) GetDump() *CriuDumpResp {
if m != nil {
return m.Dump
}
return nil
}
func (m *CriuResp) GetRestore() *CriuRestoreResp {
if m != nil {
return m.Restore
}
return nil
}
func (m *CriuResp) GetNotify() *CriuNotify {
if m != nil {
return m.Notify
}
return nil
}
func (m *CriuResp) GetPs() *CriuPageServerInfo {
if m != nil {
return m.Ps
}
return nil
}
func (m *CriuResp) GetCrErrno() int32 {
if m != nil && m.CrErrno != nil {
return *m.CrErrno
}
return 0
}
func init() {
proto.RegisterEnum("CriuReqType", CriuReqType_name, CriuReqType_value)
}

View File

@@ -0,0 +1,127 @@
message criu_page_server_info {
optional string address = 1;
optional int32 port = 2;
optional int32 pid = 3;
optional int32 fd = 4;
}
message criu_veth_pair {
required string if_in = 1;
required string if_out = 2;
};
message ext_mount_map {
required string key = 1;
required string val = 2;
};
message inherit_fd {
required string key = 1;
required int32 fd = 2;
};
message cgroup_root {
optional string ctrl = 1;
required string path = 2;
};
message criu_opts {
required int32 images_dir_fd = 1;
optional int32 pid = 2; /* if not set on dump, will dump requesting process */
optional bool leave_running = 3;
optional bool ext_unix_sk = 4;
optional bool tcp_established = 5;
optional bool evasive_devices = 6;
optional bool shell_job = 7;
optional bool file_locks = 8;
optional int32 log_level = 9 [default = 2];
optional string log_file = 10; /* No subdirs are allowed. Consider using work-dir */
optional criu_page_server_info ps = 11;
optional bool notify_scripts = 12;
optional string root = 13;
optional string parent_img = 14;
optional bool track_mem = 15;
optional bool auto_dedup = 16;
optional int32 work_dir_fd = 17;
optional bool link_remap = 18;
repeated criu_veth_pair veths = 19;
optional uint32 cpu_cap = 20 [default = 0xffffffff];
optional bool force_irmap = 21;
repeated string exec_cmd = 22;
repeated ext_mount_map ext_mnt = 23;
optional bool manage_cgroups = 24;
repeated cgroup_root cg_root = 25;
optional bool rst_sibling = 26; /* swrk only */
repeated inherit_fd inherit_fd = 27;
}
message criu_dump_resp {
optional bool restored = 1;
}
message criu_restore_resp {
required int32 pid = 1;
}
message criu_notify {
optional string script = 1;
optional int32 pid = 2;
}
enum criu_req_type {
EMPTY = 0;
DUMP = 1;
RESTORE = 2;
CHECK = 3;
PRE_DUMP = 4;
PAGE_SERVER = 5;
NOTIFY = 6;
CPUINFO_DUMP = 7;
CPUINFO_CHECK = 8;
}
/*
* Request -- each type corresponds to must-be-there
* request arguments of respective type
*/
message criu_req {
required criu_req_type type = 1;
optional criu_opts opts = 2;
optional bool notify_success = 3;
/*
* When set service won't close the connection but
* will wait for more req-s to appear. Works not
* for all request types.
*/
optional bool keep_open = 4;
}
/*
* Responce -- it states whether the request was served
* and additional request-specific informarion
*/
message criu_resp {
required criu_req_type type = 1;
required bool success = 2;
optional criu_dump_resp dump = 3;
optional criu_restore_resp restore = 4;
optional criu_notify notify = 5;
optional criu_page_server_info ps = 6;
optional int32 cr_errno = 7;
}

View File

@@ -0,0 +1,16 @@
package devices
import (
"github.com/docker/libcontainer/configs"
)
// TODO Windows. This can be factored out further - Devices are not supported
// by Windows Containers.
func DeviceFromPath(path, permissions string) (*configs.Device, error) {
return nil, nil
}
func HostDevices() ([]*configs.Device, error) {
return nil, nil
}

View File

@@ -1,3 +1,5 @@
// +build linux freebsd
package devices
/*

View File

@@ -106,6 +106,7 @@ func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {
l := &LinuxFactory{
Root: root,
Validator: validate.New(),
CriuPath: "criu",
}
InitArgs(os.Args[0], "init")(l)
Cgroupfs(l)
@@ -129,6 +130,10 @@ type LinuxFactory struct {
// a container.
InitArgs []string
// CriuPath is the path to the criu binary used for checkpoint and restore of
// containers.
CriuPath string
// Validator provides validation to container configurations.
Validator validate.Validator
@@ -161,6 +166,7 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
config: config,
initPath: l.InitPath,
initArgs: l.InitArgs,
criuPath: l.CriuPath,
cgroupManager: l.NewCgroupsManager(config.Cgroups, nil),
}, nil
}
@@ -174,9 +180,10 @@ func (l *LinuxFactory) Load(id string) (Container, error) {
if err != nil {
return nil, err
}
r := &restoredProcess{
r := &nonChildProcess{
processPid: state.InitProcessPid,
processStartTime: state.InitProcessStartTime,
fds: state.ExternalDescriptors,
}
return &linuxContainer{
initProcess: r,
@@ -184,6 +191,7 @@ func (l *LinuxFactory) Load(id string) (Container, error) {
config: &state.Config,
initPath: l.InitPath,
initArgs: l.InitArgs,
criuPath: l.CriuPath,
cgroupManager: l.NewCgroupsManager(state.Config.Cgroups, state.CgroupPaths),
root: containerRoot,
}, nil
@@ -253,35 +261,3 @@ func (l *LinuxFactory) validateID(id string) error {
}
return nil
}
// restoredProcess represents a process where the calling process may or may not be
// the parent process. This process is created when a factory loads a container from
// a persisted state.
type restoredProcess struct {
processPid int
processStartTime string
}
func (p *restoredProcess) start() error {
return newGenericError(fmt.Errorf("restored process cannot be started"), SystemError)
}
func (p *restoredProcess) pid() int {
return p.processPid
}
func (p *restoredProcess) terminate() error {
return newGenericError(fmt.Errorf("restored process cannot be terminated"), SystemError)
}
func (p *restoredProcess) wait() (*os.ProcessState, error) {
return nil, newGenericError(fmt.Errorf("restored process cannot be waited on"), SystemError)
}
func (p *restoredProcess) startTime() (string, error) {
return p.processStartTime, nil
}
func (p *restoredProcess) signal(s os.Signal) error {
return newGenericError(fmt.Errorf("restored process cannot be signaled"), SystemError)
}

View File

@@ -4,6 +4,7 @@ set -e
# This script runs all validations
validate() {
export MAKEDIR=/go/src/github.com/docker/docker/hack/make
sed -i 's!docker/docker!docker/libcontainer!' /go/src/github.com/docker/docker/hack/make/.validate
bash /go/src/github.com/docker/docker/hack/make/validate-dco
bash /go/src/github.com/docker/docker/hack/make/validate-gofmt

View File

@@ -9,10 +9,11 @@ import (
"strings"
"syscall"
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
"github.com/docker/libcontainer/netlink"
"github.com/docker/libcontainer/seccomp"
"github.com/docker/libcontainer/system"
"github.com/docker/libcontainer/user"
"github.com/docker/libcontainer/utils"
@@ -176,10 +177,20 @@ func setupUser(config *initConfig) error {
if err != nil {
return err
}
suppGroups := append(execUser.Sgids, config.Config.AdditionalGroups...)
var addGroups []int
if len(config.Config.AdditionalGroups) > 0 {
addGroups, err = user.GetAdditionalGroupsPath(config.Config.AdditionalGroups, groupPath)
if err != nil {
return err
}
}
suppGroups := append(execUser.Sgids, addGroups...)
if err := syscall.Setgroups(suppGroups); err != nil {
return err
}
if err := system.Setgid(execUser.Gid); err != nil {
return err
}
@@ -234,7 +245,7 @@ func setupRlimits(config *configs.Config) error {
func killCgroupProcesses(m cgroups.Manager) error {
var procs []*os.Process
if err := m.Freeze(configs.Frozen); err != nil {
log.Warn(err)
logrus.Warn(err)
}
pids, err := m.GetPids()
if err != nil {
@@ -245,17 +256,75 @@ func killCgroupProcesses(m cgroups.Manager) error {
if p, err := os.FindProcess(pid); err == nil {
procs = append(procs, p)
if err := p.Kill(); err != nil {
log.Warn(err)
logrus.Warn(err)
}
}
}
if err := m.Freeze(configs.Thawed); err != nil {
log.Warn(err)
logrus.Warn(err)
}
for _, p := range procs {
if _, err := p.Wait(); err != nil {
log.Warn(err)
logrus.Warn(err)
}
}
return nil
}
func finalizeSeccomp(config *initConfig) error {
if config.Config.Seccomp == nil {
return nil
}
context := seccomp.New()
for _, s := range config.Config.Seccomp.Syscalls {
ss := &seccomp.Syscall{
Value: uint32(s.Value),
Action: seccompAction(s.Action),
}
if len(s.Args) > 0 {
ss.Args = seccompArgs(s.Args)
}
context.Add(ss)
}
return context.Load()
}
func seccompAction(a configs.Action) seccomp.Action {
switch a {
case configs.Kill:
return seccomp.Kill
case configs.Trap:
return seccomp.Trap
case configs.Allow:
return seccomp.Allow
}
return seccomp.Error(syscall.Errno(int(a)))
}
func seccompArgs(args []*configs.Arg) seccomp.Args {
var sa []seccomp.Arg
for _, a := range args {
sa = append(sa, seccomp.Arg{
Index: uint32(a.Index),
Op: seccompOperator(a.Op),
Value: uint(a.Value),
})
}
return seccomp.Args{sa}
}
func seccompOperator(o configs.Operator) seccomp.Operator {
switch o {
case configs.EqualTo:
return seccomp.EqualTo
case configs.NotEqualTo:
return seccomp.NotEqualTo
case configs.GreatherThan:
return seccomp.GreatherThan
case configs.LessThan:
return seccomp.LessThan
case configs.MaskEqualTo:
return seccomp.MaskEqualTo
}
return 0
}

View File

@@ -105,14 +105,14 @@ func Relabel(path string, fileLabel string, relabel string) error {
if fileLabel == "" {
return nil
}
if !strings.ContainsAny(relabel, "zZ") {
return nil
}
for _, p := range exclude_path {
if path == p {
return fmt.Errorf("Relabeling of %s is not allowed", path)
}
}
if !strings.ContainsAny(relabel, "zZ") {
return nil
}
if strings.Contains(relabel, "z") && strings.Contains(relabel, "Z") {
return fmt.Errorf("Bad SELinux option z and Z can not be used together")
}

View File

@@ -9,6 +9,7 @@ import (
"os"
"sync/atomic"
"syscall"
"time"
"unsafe"
)
@@ -26,6 +27,7 @@ const (
SIOC_BRADDBR = 0x89a0
SIOC_BRDELBR = 0x89a1
SIOC_BRADDIF = 0x89a2
SIOC_BRDELIF = 0x89a3
)
const (
@@ -54,6 +56,8 @@ type ifreqFlags struct {
var native binary.ByteOrder
var rnd = rand.New(rand.NewSource(time.Now().UnixNano()))
func init() {
var x uint32 = 0x01020304
if *(*byte)(unsafe.Pointer(&x)) == 0x01 {
@@ -1188,9 +1192,7 @@ func DeleteBridge(name string) error {
return nil
}
// Add a slave to abridge device. This is more backward-compatible than
// netlink.NetworkSetMaster and works on RHEL 6.
func AddToBridge(iface, master *net.Interface) error {
func ifIoctBridge(iface, master *net.Interface, op uintptr) error {
if len(master.Name) >= IFNAMSIZ {
return fmt.Errorf("Interface name %s too long", master.Name)
}
@@ -1205,17 +1207,29 @@ func AddToBridge(iface, master *net.Interface) error {
copy(ifr.IfrnName[:len(ifr.IfrnName)-1], master.Name)
ifr.IfruIndex = int32(iface.Index)
if _, _, err := syscall.Syscall(syscall.SYS_IOCTL, uintptr(s), SIOC_BRADDIF, uintptr(unsafe.Pointer(&ifr))); err != 0 {
if _, _, err := syscall.Syscall(syscall.SYS_IOCTL, uintptr(s), op, uintptr(unsafe.Pointer(&ifr))); err != 0 {
return err
}
return nil
}
// Add a slave to a bridge device. This is more backward-compatible than
// netlink.NetworkSetMaster and works on RHEL 6.
func AddToBridge(iface, master *net.Interface) error {
return ifIoctBridge(iface, master, SIOC_BRADDIF)
}
// Detach a slave from a bridge device. This is more backward-compatible than
// netlink.NetworkSetMaster and works on RHEL 6.
func DelFromBridge(iface, master *net.Interface) error {
return ifIoctBridge(iface, master, SIOC_BRDELIF)
}
func randMacAddr() string {
hw := make(net.HardwareAddr, 6)
for i := 0; i < 6; i++ {
hw[i] = byte(rand.Intn(255))
hw[i] = byte(rnd.Intn(255))
}
hw[0] &^= 0x1 // clear multicast bit
hw[0] |= 0x2 // set local assignment bit (IEEE802)

View File

@@ -1,3 +1,5 @@
// +build arm ppc64 ppc64le
package netlink
func ifrDataByte(b byte) uint8 {

View File

@@ -1,4 +1,4 @@
// +build !arm
// +build !arm,!ppc64,!ppc64le
package netlink

View File

@@ -10,6 +10,7 @@ import (
"strconv"
"strings"
"github.com/docker/libcontainer/configs"
"github.com/docker/libcontainer/netlink"
"github.com/docker/libcontainer/utils"
)
@@ -24,6 +25,8 @@ var strategies = map[string]networkStrategy{
type networkStrategy interface {
create(*network, int) error
initialize(*network) error
detach(*configs.Network) error
attach(*configs.Network) error
}
// getStrategy returns the specific network strategy for the
@@ -97,32 +100,39 @@ func (l *loopback) initialize(config *network) error {
return netlink.NetworkLinkUp(iface)
}
func (l *loopback) attach(n *configs.Network) (err error) {
return nil
}
func (l *loopback) detach(n *configs.Network) (err error) {
return nil
}
// veth is a network strategy that uses a bridge and creates
// a veth pair, one that is attached to the bridge on the host and the other
// is placed inside the container's namespace
type veth struct {
}
func (v *veth) create(n *network, nspid int) (err error) {
tmpName, err := v.generateTempPeerName()
if err != nil {
return err
}
n.TempVethPeerName = tmpName
defer func() {
if err != nil {
netlink.NetworkLinkDel(n.HostInterfaceName)
netlink.NetworkLinkDel(n.TempVethPeerName)
}
}()
if n.Bridge == "" {
return fmt.Errorf("bridge is not specified")
}
func (v *veth) detach(n *configs.Network) (err error) {
bridge, err := net.InterfaceByName(n.Bridge)
if err != nil {
return err
}
if err := netlink.NetworkCreateVethPair(n.HostInterfaceName, n.TempVethPeerName, n.TxQueueLen); err != nil {
host, err := net.InterfaceByName(n.HostInterfaceName)
if err != nil {
return err
}
if err := netlink.DelFromBridge(host, bridge); err != nil {
return err
}
return nil
}
// attach a container network interface to an external network
func (v *veth) attach(n *configs.Network) (err error) {
bridge, err := net.InterfaceByName(n.Bridge)
if err != nil {
return err
}
host, err := net.InterfaceByName(n.HostInterfaceName)
@@ -143,6 +153,31 @@ func (v *veth) create(n *network, nspid int) (err error) {
if err := netlink.NetworkLinkUp(host); err != nil {
return err
}
return nil
}
func (v *veth) create(n *network, nspid int) (err error) {
tmpName, err := v.generateTempPeerName()
if err != nil {
return err
}
n.TempVethPeerName = tmpName
defer func() {
if err != nil {
netlink.NetworkLinkDel(n.HostInterfaceName)
netlink.NetworkLinkDel(n.TempVethPeerName)
}
}()
if n.Bridge == "" {
return fmt.Errorf("bridge is not specified")
}
if err := netlink.NetworkCreateVethPair(n.HostInterfaceName, n.TempVethPeerName, n.TxQueueLen); err != nil {
return err
}
if err := v.attach(&n.Network); err != nil {
return err
}
child, err := net.InterfaceByName(n.TempVethPeerName)
if err != nil {
return err

View File

@@ -1,3 +1,5 @@
// +build !linux !cgo
package nsenter
import "C"

View File

@@ -148,15 +148,15 @@ void nsexec()
pr_perror("ioctl TIOCSCTTY failed");
exit(1);
}
if (dup2(consolefd, STDIN_FILENO) != STDIN_FILENO) {
if (dup3(consolefd, STDIN_FILENO, 0) != STDIN_FILENO) {
pr_perror("Failed to dup 0");
exit(1);
}
if (dup2(consolefd, STDOUT_FILENO) != STDOUT_FILENO) {
if (dup3(consolefd, STDOUT_FILENO, 0) != STDOUT_FILENO) {
pr_perror("Failed to dup 1");
exit(1);
}
if (dup2(consolefd, STDERR_FILENO) != STDERR_FILENO) {
if (dup3(consolefd, STDERR_FILENO, 0) != STDERR_FILENO) {
pr_perror("Failed to dup 2");
exit(1);
}

View File

@@ -1,7 +1,7 @@
## nsinit
`nsinit` is a cli application which demonstrates the use of libcontainer.
It is able to spawn new containers or join existing containers.
`nsinit` is a cli application which demonstrates the use of libcontainer.
It is able to spawn new containers or join existing containers.
### How to build?
@@ -58,7 +58,7 @@ If you wish to spawn another process inside the container while your current
bash session is running, run the same command again to get another bash shell
(or change the command). If the original process (PID 1) dies, all other
processes spawned inside the container will be killed and the namespace will
be removed.
be removed.
You can identify if a process is running in a container by looking to see if
`state.json` is in the root of the directory.
@@ -68,45 +68,61 @@ file is read and where the `state.json` file will be saved.
### How to use?
Currently nsinit has 9 commands. Type `nsinit -h` to list all of them.
And for every alternative command, you can also use `--help` to get more
Currently nsinit has 9 commands. Type `nsinit -h` to list all of them.
And for every alternative command, you can also use `--help` to get more
detailed help documents. For example, `nsinit config --help`.
`nsinit` cli application is implemented using [cli.go](https://github.com/codegangsta/cli).
Lots of details are handled in cli.go, so the implementation of `nsinit` itself
`nsinit` cli application is implemented using [cli.go](https://github.com/codegangsta/cli).
Lots of details are handled in cli.go, so the implementation of `nsinit` itself
is very clean and clear.
* **config**
It will generate a standard configuration file for a container. By default, it
will generate as the template file in [config.go](https://github.com/docker/libcontainer/blob/master/nsinit/config.go#L192).
It will generate a standard configuration file for a container. By default, it
will generate as the template file in [config.go](https://github.com/docker/libcontainer/blob/f28dff5539855bac2adbc5699f57f84349605b5f/nsinit/config.go#L234).
It will modify the template if you have specified some configuration by options.
* **exec**
Starts a container and execute a new command inside it. Besides common options, it
has some special options as below.
- `--tty,-t`: allocate a TTY to the container.
- `--config`: you can specify a configuration file. By default, it will use
- `--config`: you can specify a configuration file. By default, it will use
template configuration.
- `--id`: specify the ID for a container. By default, the id is "nsinit".
- `--user,-u`: set the user, uid, and/or gid for the process. By default the
- `--user,-u`: set the user, uid, and/or gid for the process. By default the
value is "root".
- `--cwd`: set the current working dir.
- `--env`: set environment variables for the process.
* **init**
It's an internal command that is called inside the container's namespaces to
initialize the namespace and exec the user's process. It should not be called
It's an internal command that is called inside the container's namespaces to
initialize the namespace and exec the user's process. It should not be called
externally.
* **oom**
Display oom notifications for a container, you should specify container id.
* **pause**
Pause the container's processes, you should specify container id. It will use
Pause the container's processes, you should specify container id. It will use
cgroup freeze subsystem to help.
* **unpause**
Unpause the container's processes. Same with `pause`.
* **stats**
Display statistics for the container, it will mainly show cgroup and network
Display statistics for the container, it will mainly show cgroup and network
statistics.
* **state**
Get the container's current state. You can also read the state from `state.json`
in your container_id folder.
in your container_id folder.
* **checkpoint**
Checkpoint a running container. You can read [this](http://criu.org/Advanced_usage)
for more detailed information about options.
- `--id` specify the ID for a container. By default, the id is "nsinit".
- `--image-path` path for saving criu image files. You must specify this option.
- `--work-path`: path for saving work files and logs. By default it will
generate a folder named "criu.work" in root directory.
- `--leave-running`: leave the process running after checkpointing.
- `--tcp-established`: allow open tcp connections.
- `--ext-unix-sk`: allow external unix sockets.
- `--shell-job`: allow shell jobs.
- `--page-server`: ADDRESS:PORT of the page server. The dump image can be
sent to a criu page server if we have a page server.
* **restore**
Restore a container from a previous checkpoint. Options are almost the same
with checkpoint and `--image-path` must be specified.
* **help, h**
Shows a list of commands or help for one command.

View File

@@ -0,0 +1,67 @@
package main
import (
"fmt"
"github.com/codegangsta/cli"
"github.com/docker/libcontainer"
"strconv"
"strings"
)
var checkpointCommand = cli.Command{
Name: "checkpoint",
Usage: "checkpoint a running container",
Flags: []cli.Flag{
cli.StringFlag{Name: "id", Value: "nsinit", Usage: "specify the ID for a container"},
cli.StringFlag{Name: "image-path", Value: "", Usage: "path for saving criu image files"},
cli.StringFlag{Name: "work-path", Value: "", Usage: "path for saving work files and logs"},
cli.BoolFlag{Name: "leave-running", Usage: "leave the process running after checkpointing"},
cli.BoolFlag{Name: "tcp-established", Usage: "allow open tcp connections"},
cli.BoolFlag{Name: "ext-unix-sk", Usage: "allow external unix sockets"},
cli.BoolFlag{Name: "shell-job", Usage: "allow shell jobs"},
cli.StringFlag{Name: "page-server", Value: "", Usage: "ADDRESS:PORT of the page server"},
},
Action: func(context *cli.Context) {
imagePath := context.String("image-path")
if imagePath == "" {
fatal(fmt.Errorf("The --image-path option isn't specified"))
}
container, err := getContainer(context)
if err != nil {
fatal(err)
}
// these are the mandatory criu options for a container
criuOpts := &libcontainer.CriuOpts{
ImagesDirectory: imagePath,
WorkDirectory: context.String("work-path"),
LeaveRunning: context.Bool("leave-running"),
TcpEstablished: context.Bool("tcp-established"),
ExternalUnixConnections: context.Bool("ext-unix-sk"),
ShellJob: context.Bool("shell-job"),
}
// xxx following criu opts are optional
// The dump image can be sent to a criu page server
if psOpt := context.String("page-server"); psOpt != "" {
addressPort := strings.Split(psOpt, ":")
if len(addressPort) != 2 {
fatal(fmt.Errorf("Use --page-server ADDRESS:PORT to specify page server"))
}
port_int, err := strconv.Atoi(addressPort[1])
if err != nil {
fatal(fmt.Errorf("Invalid port number"))
}
criuOpts.PageServer = libcontainer.CriuPageServerInfo{
Address: addressPort[0],
Port: int32(port_int),
}
}
if err := container.Checkpoint(criuOpts); err != nil {
fatal(err)
}
},
}

View File

@@ -19,31 +19,34 @@ import (
const defaultMountFlags = syscall.MS_NOEXEC | syscall.MS_NOSUID | syscall.MS_NODEV
var createFlags = []cli.Flag{
cli.IntFlag{Name: "parent-death-signal", Usage: "set the signal that will be delivered to the process in case the parent dies"},
cli.BoolFlag{Name: "cgroup", Usage: "mount the cgroup data for the container"},
cli.BoolFlag{Name: "read-only", Usage: "set the container's rootfs as read-only"},
cli.StringSliceFlag{Name: "bind", Value: &cli.StringSlice{}, Usage: "add bind mounts to the container"},
cli.StringSliceFlag{Name: "tmpfs", Value: &cli.StringSlice{}, Usage: "add tmpfs mounts to the container"},
cli.IntFlag{Name: "cpushares", Usage: "set the cpushares for the container"},
cli.IntFlag{Name: "memory-limit", Usage: "set the memory limit for the container"},
cli.IntFlag{Name: "memory-swap", Usage: "set the memory swap limit for the container"},
cli.IntFlag{Name: "parent-death-signal", Usage: "set the signal that will be delivered to the process in case the parent dies"},
cli.IntFlag{Name: "userns-root-uid", Usage: "set the user namespace root uid"},
cli.IntFlag{Name: "veth-mtu", Usage: "veth mtu"},
cli.StringFlag{Name: "apparmor-profile", Usage: "set the apparmor profile"},
cli.StringFlag{Name: "cpuset-cpus", Usage: "set the cpuset cpus"},
cli.StringFlag{Name: "cpuset-mems", Usage: "set the cpuset mems"},
cli.StringFlag{Name: "apparmor-profile", Usage: "set the apparmor profile"},
cli.StringFlag{Name: "process-label", Usage: "set the process label"},
cli.StringFlag{Name: "mount-label", Usage: "set the mount label"},
cli.StringFlag{Name: "rootfs", Usage: "set the rootfs"},
cli.IntFlag{Name: "userns-root-uid", Usage: "set the user namespace root uid"},
cli.StringFlag{Name: "hostname", Value: "nsinit", Usage: "hostname value for the container"},
cli.StringFlag{Name: "net", Value: "", Usage: "network namespace"},
cli.StringFlag{Name: "ipc", Value: "", Usage: "ipc namespace"},
cli.StringFlag{Name: "pid", Value: "", Usage: "pid namespace"},
cli.StringFlag{Name: "uts", Value: "", Usage: "uts namespace"},
cli.StringFlag{Name: "mnt", Value: "", Usage: "mount namespace"},
cli.StringFlag{Name: "veth-bridge", Usage: "veth bridge"},
cli.StringFlag{Name: "mount-label", Usage: "set the mount label"},
cli.StringFlag{Name: "net", Value: "", Usage: "network namespace"},
cli.StringFlag{Name: "pid", Value: "", Usage: "pid namespace"},
cli.StringFlag{Name: "process-label", Usage: "set the process label"},
cli.StringFlag{Name: "rootfs", Usage: "set the rootfs"},
cli.StringFlag{Name: "security", Value: "", Usage: "set the security profile (high, medium, low)"},
cli.StringFlag{Name: "uts", Value: "", Usage: "uts namespace"},
cli.StringFlag{Name: "veth-address", Usage: "veth ip address"},
cli.StringFlag{Name: "veth-bridge", Usage: "veth bridge"},
cli.StringFlag{Name: "veth-gateway", Usage: "veth gateway address"},
cli.IntFlag{Name: "veth-mtu", Usage: "veth mtu"},
cli.BoolFlag{Name: "cgroup", Usage: "mount the cgroup data for the container"},
cli.StringSliceFlag{Name: "bind", Value: &cli.StringSlice{}, Usage: "add bind mounts to the container"},
cli.StringSliceFlag{Name: "sysctl", Value: &cli.StringSlice{}, Usage: "set system properties in the container"},
cli.StringSliceFlag{Name: "tmpfs", Value: &cli.StringSlice{}, Usage: "add tmpfs mounts to the container"},
cli.StringSliceFlag{Name: "groups", Value: &cli.StringSlice{}, Usage: "add additional groups"},
}
var configCommand = cli.Command{
@@ -111,6 +114,20 @@ func modify(config *configs.Config, context *cli.Context) {
node.Gid = uint32(userns_uid)
}
}
config.SystemProperties = make(map[string]string)
for _, sysProp := range context.StringSlice("sysctl") {
parts := strings.SplitN(sysProp, "=", 2)
if len(parts) != 2 {
logrus.Fatalf("invalid system property %s", sysProp)
}
config.SystemProperties[parts[0]] = parts[1]
}
for _, group := range context.StringSlice("groups") {
config.AdditionalGroups = append(config.AdditionalGroups, group)
}
for _, rawBind := range context.StringSlice("bind") {
mount := &configs.Mount{
Device: "bind",
@@ -194,6 +211,24 @@ func modify(config *configs.Config, context *cli.Context) {
Device: "cgroup",
})
}
modifySecurityProfile(context, config)
}
func modifySecurityProfile(context *cli.Context, config *configs.Config) {
profileName := context.String("security")
if profileName == "" {
return
}
profile := profiles[profileName]
if profile == nil {
logrus.Fatalf("invalid profile name %q", profileName)
}
config.Rlimits = profile.Rlimits
config.Capabilities = profile.Capabilities
config.Seccomp = profile.Seccomp
config.AppArmorProfile = profile.ApparmorProfile
config.MountLabel = profile.MountLabel
config.ProcessLabel = profile.ProcessLabel
}
func getTemplate() *configs.Config {
@@ -281,13 +316,5 @@ func getTemplate() *configs.Config {
Flags: defaultMountFlags | syscall.MS_RDONLY,
},
},
Rlimits: []configs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Hard: 1024,
Soft: 1024,
},
},
}
}

View File

@@ -93,10 +93,17 @@ func execAction(context *cli.Context) {
}
}
if created {
if err := container.Destroy(); err != nil {
status, err := container.Status()
if err != nil {
tty.Close()
fatal(err)
}
if status != libcontainer.Checkpointed {
if err := container.Destroy(); err != nil {
tty.Close()
fatal(err)
}
}
}
tty.Close()
os.Exit(utils.ExitStatus(status.Sys().(syscall.WaitStatus)))

View File

@@ -3,7 +3,7 @@ package main
import (
"runtime"
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
"github.com/docker/libcontainer"
_ "github.com/docker/libcontainer/nsenter"
@@ -13,7 +13,7 @@ var initCommand = cli.Command{
Name: "init",
Usage: "runs the init process inside the namespace",
Action: func(context *cli.Context) {
log.SetLevel(log.DebugLevel)
logrus.SetLevel(logrus.DebugLevel)
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
factory, err := libcontainer.New("")

View File

@@ -3,7 +3,7 @@ package main
import (
"os"
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
)
@@ -13,34 +13,37 @@ func main() {
app.Version = "2"
app.Author = "libcontainer maintainers"
app.Flags = []cli.Flag{
cli.StringFlag{Name: "root", Value: "/var/run/nsinit", Usage: "root directory for containers"},
cli.StringFlag{Name: "log-file", Value: "", Usage: "set the log file to output logs to"},
cli.BoolFlag{Name: "debug", Usage: "enable debug output in the logs"},
cli.StringFlag{Name: "root", Value: "/var/run/nsinit", Usage: "root directory for containers"},
cli.StringFlag{Name: "log-file", Usage: "set the log file to output logs to"},
cli.StringFlag{Name: "criu", Value: "criu", Usage: "path to the criu binary for checkpoint and restore"},
}
app.Commands = []cli.Command{
checkpointCommand,
configCommand,
execCommand,
initCommand,
oomCommand,
pauseCommand,
stateCommand,
statsCommand,
unpauseCommand,
stateCommand,
restoreCommand,
}
app.Before = func(context *cli.Context) error {
if context.GlobalBool("debug") {
log.SetLevel(log.DebugLevel)
logrus.SetLevel(logrus.DebugLevel)
}
if path := context.GlobalString("log-file"); path != "" {
f, err := os.Create(path)
if err != nil {
return err
}
log.SetOutput(f)
logrus.SetOutput(f)
}
return nil
}
if err := app.Run(os.Args); err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
}

View File

@@ -1,7 +1,7 @@
package main
import (
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
)
@@ -14,16 +14,16 @@ var oomCommand = cli.Command{
Action: func(context *cli.Context) {
container, err := getContainer(context)
if err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
n, err := container.NotifyOOM()
if err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
for x := range n {
// hack for calm down go1.4 gofmt
_ = x
log.Printf("OOM notification received")
logrus.Printf("OOM notification received")
}
},
}

View File

@@ -1,7 +1,7 @@
package main
import (
log "github.com/Sirupsen/logrus"
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
)
@@ -14,10 +14,10 @@ var pauseCommand = cli.Command{
Action: func(context *cli.Context) {
container, err := getContainer(context)
if err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
if err = container.Pause(); err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
},
}
@@ -31,10 +31,10 @@ var unpauseCommand = cli.Command{
Action: func(context *cli.Context) {
container, err := getContainer(context)
if err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
if err = container.Resume(); err != nil {
log.Fatal(err)
logrus.Fatal(err)
}
},
}

View File

@@ -0,0 +1,120 @@
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
"github.com/codegangsta/cli"
"github.com/docker/libcontainer"
"github.com/docker/libcontainer/utils"
)
var restoreCommand = cli.Command{
Name: "restore",
Usage: "restore a container from a previous checkpoint",
Flags: []cli.Flag{
cli.StringFlag{Name: "id", Value: "nsinit", Usage: "specify the ID for a container"},
cli.StringFlag{Name: "image-path", Value: "", Usage: "path to criu image files for restoring"},
cli.StringFlag{Name: "work-path", Value: "", Usage: "path for saving work files and logs"},
cli.BoolFlag{Name: "tcp-established", Usage: "allow open tcp connections"},
cli.BoolFlag{Name: "ext-unix-sk", Usage: "allow external unix sockets"},
cli.BoolFlag{Name: "shell-job", Usage: "allow shell jobs"},
},
Action: func(context *cli.Context) {
imagePath := context.String("image-path")
if imagePath == "" {
fatal(fmt.Errorf("The --image-path option isn't specified"))
}
var (
container libcontainer.Container
err error
)
factory, err := loadFactory(context)
if err != nil {
fatal(err)
}
config, err := loadConfig(context)
if err != nil {
fatal(err)
}
created := false
container, err = factory.Load(context.String("id"))
if err != nil {
created = true
if container, err = factory.Create(context.String("id"), config); err != nil {
fatal(err)
}
}
process := &libcontainer.Process{
Stdin: os.Stdin,
Stdout: os.Stdout,
Stderr: os.Stderr,
}
//rootuid, err := config.HostUID()
//if err != nil {
//fatal(err)
//}
rootuid := 0 // XXX
tty, err := newTty(context, process, rootuid)
if err != nil {
fatal(err)
}
if err := tty.attach(process); err != nil {
fatal(err)
}
go handleSignals(process, tty)
err = container.Restore(process, &libcontainer.CriuOpts{
ImagesDirectory: imagePath,
WorkDirectory: context.String("work-path"),
TcpEstablished: context.Bool("tcp-established"),
ExternalUnixConnections: context.Bool("ext-unix-sk"),
ShellJob: context.Bool("shell-job"),
})
if err != nil {
tty.Close()
if created {
container.Destroy()
}
fatal(err)
}
status, err := process.Wait()
if err != nil {
exitError, ok := err.(*exec.ExitError)
if ok {
status = exitError.ProcessState
} else {
tty.Close()
if created {
container.Destroy()
}
fatal(err)
}
}
if created {
status, err := container.Status()
if err != nil {
tty.Close()
fatal(err)
}
if status != libcontainer.Checkpointed {
if err := container.Destroy(); err != nil {
tty.Close()
fatal(err)
}
}
}
tty.Close()
os.Exit(utils.ExitStatus(status.Sys().(syscall.WaitStatus)))
},
}

View File

@@ -0,0 +1,272 @@
package main
import (
"syscall"
"github.com/docker/libcontainer/configs"
"github.com/docker/libcontainer/system"
)
var profiles = map[string]*securityProfile{
"high": highProfile,
"medium": mediumProfile,
"low": lowProfile,
}
type securityProfile struct {
Capabilities []string `json:"capabilities"`
ApparmorProfile string `json:"apparmor_profile"`
MountLabel string `json:"mount_label"`
ProcessLabel string `json:"process_label"`
Rlimits []configs.Rlimit `json:"rlimits"`
Seccomp *configs.Seccomp `json:"seccomp"`
}
// this should be a runtime config that is not able to do things like apt-get or yum install.
var highProfile = &securityProfile{
Capabilities: []string{
"NET_BIND_SERVICE",
"KILL",
"AUDIT_WRITE",
},
Rlimits: []configs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Hard: 1024,
Soft: 1024,
},
},
// http://man7.org/linux/man-pages/man2/syscalls.2.html
Seccomp: &configs.Seccomp{
Syscalls: []*configs.Syscall{
{
Value: syscall.SYS_CAPSET, // http://man7.org/linux/man-pages/man2/capset.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UNSHARE, // http://man7.org/linux/man-pages/man2/unshare.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: int(system.SysSetns()),
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_MOUNT, // http://man7.org/linux/man-pages/man2/mount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UMOUNT2, // http://man7.org/linux/man-pages/man2/umount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CREATE_MODULE, // http://man7.org/linux/man-pages/man2/create_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_DELETE_MODULE, // http://man7.org/linux/man-pages/man2/delete_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CHMOD, // http://man7.org/linux/man-pages/man2/chmod.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CHOWN, // http://man7.org/linux/man-pages/man2/chown.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_LINK, // http://man7.org/linux/man-pages/man2/link.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_LINKAT, // http://man7.org/linux/man-pages/man2/linkat.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UNLINK, // http://man7.org/linux/man-pages/man2/unlink.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UNLINKAT, // http://man7.org/linux/man-pages/man2/unlinkat.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CHROOT, // http://man7.org/linux/man-pages/man2/chroot.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_KEXEC_LOAD, // http://man7.org/linux/man-pages/man2/kexec_load.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_SETDOMAINNAME, // http://man7.org/linux/man-pages/man2/setdomainname.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_SETHOSTNAME, // http://man7.org/linux/man-pages/man2/sethostname.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CLONE, // http://man7.org/linux/man-pages/man2/clone.2.html
Action: configs.Action(syscall.EPERM),
Args: []*configs.Arg{
{
Index: 0, // the glibc wrapper has the flags at arg2 but the raw syscall has flags at arg0
Value: syscall.CLONE_NEWUSER,
Op: configs.MaskEqualTo,
},
},
},
},
},
}
// This is a medium level profile that should be able to do things like installing from
// apt-get or yum.
var mediumProfile = &securityProfile{
Capabilities: []string{
"CHOWN",
"DAC_OVERRIDE",
"FSETID",
"FOWNER",
"SETGID",
"SETUID",
"SETFCAP",
"SETPCAP",
"NET_BIND_SERVICE",
"KILL",
"AUDIT_WRITE",
},
Rlimits: []configs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Hard: 1024,
Soft: 1024,
},
},
// http://man7.org/linux/man-pages/man2/syscalls.2.html
Seccomp: &configs.Seccomp{
Syscalls: []*configs.Syscall{
{
Value: syscall.SYS_UNSHARE, // http://man7.org/linux/man-pages/man2/unshare.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: int(system.SysSetns()),
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_MOUNT, // http://man7.org/linux/man-pages/man2/mount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UMOUNT2, // http://man7.org/linux/man-pages/man2/umount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CHROOT, // http://man7.org/linux/man-pages/man2/chroot.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CREATE_MODULE, // http://man7.org/linux/man-pages/man2/create_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_DELETE_MODULE, // http://man7.org/linux/man-pages/man2/delete_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_KEXEC_LOAD, // http://man7.org/linux/man-pages/man2/kexec_load.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_SETDOMAINNAME, // http://man7.org/linux/man-pages/man2/setdomainname.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_SETHOSTNAME, // http://man7.org/linux/man-pages/man2/sethostname.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CLONE, // http://man7.org/linux/man-pages/man2/clone.2.html
Action: configs.Action(syscall.EPERM),
Args: []*configs.Arg{
{
Index: 0, // the glibc wrapper has the flags at arg2 but the raw syscall has flags at arg0
Value: syscall.CLONE_NEWUSER,
Op: configs.MaskEqualTo,
},
},
},
},
},
}
var lowProfile = &securityProfile{
Capabilities: []string{
"CHOWN",
"DAC_OVERRIDE",
"FSETID",
"FOWNER",
"SETGID",
"SETUID",
"SYS_CHROOT",
"SETFCAP",
"SETPCAP",
"NET_BIND_SERVICE",
"KILL",
"AUDIT_WRITE",
},
Rlimits: []configs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Hard: 1024,
Soft: 1024,
},
},
// http://man7.org/linux/man-pages/man2/syscalls.2.html
Seccomp: &configs.Seccomp{
Syscalls: []*configs.Syscall{
{
Value: syscall.SYS_UNSHARE, // http://man7.org/linux/man-pages/man2/unshare.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: int(system.SysSetns()),
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_MOUNT, // http://man7.org/linux/man-pages/man2/mount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_UMOUNT2, // http://man7.org/linux/man-pages/man2/umount.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CREATE_MODULE, // http://man7.org/linux/man-pages/man2/create_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_DELETE_MODULE, // http://man7.org/linux/man-pages/man2/delete_module.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_KEXEC_LOAD, // http://man7.org/linux/man-pages/man2/kexec_load.2.html
Action: configs.Action(syscall.EPERM),
},
{
Value: syscall.SYS_CLONE, // http://man7.org/linux/man-pages/man2/clone.2.html
Action: configs.Action(syscall.EPERM),
Args: []*configs.Arg{
{
Index: 0, // the glibc wrapper has the flags at arg2 but the raw syscall has flags at arg0
Value: syscall.CLONE_NEWUSER,
Op: configs.MaskEqualTo,
},
},
},
},
},
}

View File

@@ -17,6 +17,9 @@ func newTty(context *cli.Context, p *libcontainer.Process, rootuid int) (*tty, e
}
return &tty{
console: console,
closers: []io.Closer{
console,
},
}, nil
}
return &tty{}, nil
@@ -25,11 +28,12 @@ func newTty(context *cli.Context, p *libcontainer.Process, rootuid int) (*tty, e
type tty struct {
console libcontainer.Console
state *term.State
closers []io.Closer
}
func (t *tty) Close() error {
if t.console != nil {
t.console.Close()
for _, c := range t.closers {
c.Close()
}
if t.state != nil {
term.RestoreTerminal(os.Stdin.Fd(), t.state)
@@ -49,10 +53,35 @@ func (t *tty) attach(process *libcontainer.Process) error {
process.Stderr = nil
process.Stdout = nil
process.Stdin = nil
} else {
// setup standard pipes so that the TTY of the calling nsinit process
// is not inherited by the container.
r, w, err := os.Pipe()
if err != nil {
return err
}
go io.Copy(w, os.Stdin)
t.closers = append(t.closers, w)
process.Stdin = r
if r, w, err = os.Pipe(); err != nil {
return err
}
go io.Copy(os.Stdout, r)
process.Stdout = w
t.closers = append(t.closers, r)
if r, w, err = os.Pipe(); err != nil {
return err
}
go io.Copy(os.Stderr, r)
process.Stderr = w
t.closers = append(t.closers, r)
}
return nil
}
func (t *tty) setupPipe() {
}
func (t *tty) resize() error {
if t.console == nil {
return nil

View File

@@ -3,8 +3,10 @@ package main
import (
"encoding/json"
"fmt"
log "github.com/Sirupsen/logrus"
"os"
"path/filepath"
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
"github.com/docker/libcontainer"
@@ -36,10 +38,18 @@ func loadFactory(context *cli.Context) (libcontainer.Factory, error) {
if systemd.UseSystemd() {
cgm = libcontainer.SystemdCgroups
} else {
log.Warn("systemd cgroup flag passed, but systemd support for managing cgroups is not available.")
logrus.Warn("systemd cgroup flag passed, but systemd support for managing cgroups is not available.")
}
}
return libcontainer.New(context.GlobalString("root"), cgm)
root := context.GlobalString("root")
abs, err := filepath.Abs(root)
if err != nil {
return nil, err
}
return libcontainer.New(abs, cgm, func(l *libcontainer.LinuxFactory) error {
l.CriuPath = context.GlobalString("criu")
return nil
})
}
func getContainer(context *cli.Context) (libcontainer.Container, error) {

View File

@@ -8,6 +8,8 @@ import (
"io"
"os"
"os/exec"
"path/filepath"
"strconv"
"syscall"
"github.com/docker/libcontainer/cgroups"
@@ -31,6 +33,10 @@ type parentProcess interface {
startTime() (string, error)
signal(os.Signal) error
externalDescriptors() []string
setExternalDescriptors(fds []string)
}
type setnsProcess struct {
@@ -39,6 +45,7 @@ type setnsProcess struct {
childPipe *os.File
cgroupPaths map[string]string
config *initConfig
fds []string
}
func (p *setnsProcess) startTime() (string, error) {
@@ -142,18 +149,32 @@ func (p *setnsProcess) pid() int {
return p.cmd.Process.Pid
}
func (p *setnsProcess) externalDescriptors() []string {
return p.fds
}
func (p *setnsProcess) setExternalDescriptors(newFds []string) {
p.fds = newFds
}
type initProcess struct {
cmd *exec.Cmd
parentPipe *os.File
childPipe *os.File
config *initConfig
manager cgroups.Manager
container *linuxContainer
fds []string
}
func (p *initProcess) pid() int {
return p.cmd.Process.Pid
}
func (p *initProcess) externalDescriptors() []string {
return p.fds
}
func (p *initProcess) start() error {
defer p.parentPipe.Close()
err := p.cmd.Start()
@@ -161,6 +182,15 @@ func (p *initProcess) start() error {
if err != nil {
return newSystemError(err)
}
// Save the standard descriptor names before the container process
// can potentially move them (e.g., via dup2()). If we don't do this now,
// we won't know at checkpoint time which file descriptor to look up.
fds, err := getPipeFds(p.pid())
if err != nil {
return newSystemError(err)
}
p.setExternalDescriptors(fds)
// Do this before syncing with child so that no children
// can escape the cgroup
if err := p.manager.Apply(p.pid()); err != nil {
@@ -250,3 +280,24 @@ func (p *initProcess) signal(sig os.Signal) error {
}
return syscall.Kill(p.cmd.Process.Pid, s)
}
func (p *initProcess) setExternalDescriptors(newFds []string) {
p.fds = newFds
}
func getPipeFds(pid int) ([]string, error) {
var fds []string
fds = make([]string, 3)
dirPath := filepath.Join("/proc", strconv.Itoa(pid), "/fd")
for i := 0; i < 3; i++ {
f := filepath.Join(dirPath, strconv.Itoa(i))
target, err := os.Readlink(f)
if err != nil {
return fds, err
}
fds[i] = target
}
return fds, nil
}

View File

@@ -0,0 +1,118 @@
// +build linux
package libcontainer
import (
"fmt"
"os"
"github.com/docker/libcontainer/system"
)
func newRestoredProcess(pid int, fds []string) (*restoredProcess, error) {
var (
err error
)
proc, err := os.FindProcess(pid)
if err != nil {
return nil, err
}
started, err := system.GetProcessStartTime(pid)
if err != nil {
return nil, err
}
return &restoredProcess{
proc: proc,
processStartTime: started,
fds: fds,
}, nil
}
type restoredProcess struct {
proc *os.Process
processStartTime string
fds []string
}
func (p *restoredProcess) start() error {
return newGenericError(fmt.Errorf("restored process cannot be started"), SystemError)
}
func (p *restoredProcess) pid() int {
return p.proc.Pid
}
func (p *restoredProcess) terminate() error {
err := p.proc.Kill()
if _, werr := p.wait(); err == nil {
err = werr
}
return err
}
func (p *restoredProcess) wait() (*os.ProcessState, error) {
// TODO: how do we wait on the actual process?
// maybe use --exec-cmd in criu
st, err := p.proc.Wait()
if err != nil {
return nil, err
}
return st, nil
}
func (p *restoredProcess) startTime() (string, error) {
return p.processStartTime, nil
}
func (p *restoredProcess) signal(s os.Signal) error {
return p.proc.Signal(s)
}
func (p *restoredProcess) externalDescriptors() []string {
return p.fds
}
func (p *restoredProcess) setExternalDescriptors(newFds []string) {
p.fds = newFds
}
// nonChildProcess represents a process where the calling process is not
// the parent process. This process is created when a factory loads a container from
// a persisted state.
type nonChildProcess struct {
processPid int
processStartTime string
fds []string
}
func (p *nonChildProcess) start() error {
return newGenericError(fmt.Errorf("restored process cannot be started"), SystemError)
}
func (p *nonChildProcess) pid() int {
return p.processPid
}
func (p *nonChildProcess) terminate() error {
return newGenericError(fmt.Errorf("restored process cannot be terminated"), SystemError)
}
func (p *nonChildProcess) wait() (*os.ProcessState, error) {
return nil, newGenericError(fmt.Errorf("restored process cannot be waited on"), SystemError)
}
func (p *nonChildProcess) startTime() (string, error) {
return p.processStartTime, nil
}
func (p *nonChildProcess) signal(s os.Signal) error {
return newGenericError(fmt.Errorf("restored process cannot be signaled"), SystemError)
}
func (p *nonChildProcess) externalDescriptors() []string {
return p.fds
}
func (p *nonChildProcess) setExternalDescriptors(newFds []string) {
p.fds = newFds
}

View File

@@ -13,6 +13,7 @@ import (
"syscall"
"time"
"github.com/docker/docker/pkg/symlink"
"github.com/docker/libcontainer/cgroups"
"github.com/docker/libcontainer/configs"
"github.com/docker/libcontainer/label"
@@ -48,11 +49,6 @@ func setupRootfs(config *configs.Config, console *linuxConsole) (err error) {
if err := setupPtmx(config, console); err != nil {
return newSystemError(err)
}
// stdin, stdout and stderr could be pointing to /dev/null from parent namespace.
// re-open them inside this namespace.
if err := reOpenDevNull(config.Rootfs); err != nil {
return newSystemError(err)
}
if err := setupDevSymlinks(config.Rootfs); err != nil {
return newSystemError(err)
}
@@ -67,6 +63,9 @@ func setupRootfs(config *configs.Config, console *linuxConsole) (err error) {
if err != nil {
return newSystemError(err)
}
if err := reOpenDevNull(config.Rootfs); err != nil {
return newSystemError(err)
}
if config.Readonlyfs {
if err := setReadonly(); err != nil {
return newSystemError(err)
@@ -139,6 +138,16 @@ func mountToRootfs(m *configs.Mount, rootfs, mountLabel string) error {
// unable to bind anything to it.
return err
}
// ensure that the destination of the bind mount is resolved of symlinks at mount time because
// any previous mounts can invalidate the next mount's destination.
// this can happen when a user specifies mounts within other mounts to cause breakouts or other
// evil stuff to try to escape the container's rootfs.
if dest, err = symlink.FollowSymlinkInScope(filepath.Join(rootfs, m.Destination), rootfs); err != nil {
return err
}
if err := checkMountDestination(rootfs, dest); err != nil {
return err
}
if err := createIfNotExists(dest, stat.IsDir()); err != nil {
return err
}
@@ -197,6 +206,28 @@ func mountToRootfs(m *configs.Mount, rootfs, mountLabel string) error {
return nil
}
// checkMountDestination checks to ensure that the mount destination is not over the
// top of /proc or /sys.
// dest is required to be an abs path and have any symlinks resolved before calling this function.
func checkMountDestination(rootfs, dest string) error {
if filepath.Clean(rootfs) == filepath.Clean(dest) {
return fmt.Errorf("mounting into / is prohibited")
}
invalidDestinations := []string{
"/proc",
}
for _, invalid := range invalidDestinations {
path, err := filepath.Rel(filepath.Join(rootfs, invalid), dest)
if err != nil {
return err
}
if path == "." || !strings.HasPrefix(path, "..") {
return fmt.Errorf("%q cannot be mounted because it is located inside %q", dest, invalid)
}
}
return nil
}
func setupDevSymlinks(rootfs string) error {
var links = [][2]string{
{"/proc/self/fd", "/dev/fd"},
@@ -221,11 +252,13 @@ func setupDevSymlinks(rootfs string) error {
return nil
}
// If stdin, stdout or stderr are pointing to '/dev/null' in the global mount namespace,
// this method will make them point to '/dev/null' in this namespace.
// If stdin, stdout, and/or stderr are pointing to `/dev/null` in the parent's rootfs
// this method will make them point to `/dev/null` in this container's rootfs. This
// needs to be called after we chroot/pivot into the container's rootfs so that any
// symlinks are resolved locally.
func reOpenDevNull(rootfs string) error {
var stat, devNullStat syscall.Stat_t
file, err := os.Open(filepath.Join(rootfs, "/dev/null"))
file, err := os.Open("/dev/null")
if err != nil {
return fmt.Errorf("Failed to open /dev/null - %s", err)
}
@@ -239,7 +272,7 @@ func reOpenDevNull(rootfs string) error {
}
if stat.Rdev == devNullStat.Rdev {
// Close and re-open the fd.
if err := syscall.Dup2(int(file.Fd()), fd); err != nil {
if err := syscall.Dup3(int(file.Fd()), fd, 0); err != nil {
return err
}
}

View File

@@ -0,0 +1,32 @@
package seccomp
import "strings"
type bpfLabel struct {
label string
location uint32
}
type bpfLabels []bpfLabel
// labelIndex returns the index for the label if it exists in the slice.
// if it does not exist in the slice it appends the label lb to the end
// of the slice and returns the index.
func labelIndex(labels *bpfLabels, lb string) uint32 {
var id uint32
for id = 0; id < uint32(len(*labels)); id++ {
if strings.EqualFold(lb, (*labels)[id].label) {
return id
}
}
*labels = append(*labels, bpfLabel{lb, 0xffffffff})
return id
}
func scmpBpfStmt(code uint16, k uint32) sockFilter {
return sockFilter{code, 0, 0, k}
}
func scmpBpfJump(code uint16, k uint32, jt, jf uint8) sockFilter {
return sockFilter{code, jt, jf, k}
}

View File

@@ -0,0 +1,144 @@
package seccomp
import (
"errors"
"syscall"
)
const labelTemplate = "lb-%d-%d"
// Action is the type of action that will be taken when a
// syscall is performed.
type Action int
const (
Kill Action = iota - 3 // Kill the calling process of the syscall.
Trap // Trap and coredump the calling process of the syscall.
Allow // Allow the syscall to be completed.
)
// Syscall is the specified syscall, action, and any type of arguments
// to filter on.
type Syscall struct {
// Value is the syscall number.
Value uint32
// Action is the action to perform when the specified syscall is made.
Action Action
// Args are filters that can be specified on the arguments to the syscall.
Args Args
}
func (s *Syscall) scmpAction() uint32 {
switch s.Action {
case Allow:
return retAllow
case Trap:
return retTrap
case Kill:
return retKill
}
return actionErrno(uint32(s.Action))
}
// Arg represents an argument to the syscall with the argument's index,
// the operator to apply when matching, and the argument's value at that time.
type Arg struct {
Index uint32 // index of args which start from zero
Op Operator // operation, such as EQ/NE/GE/LE
Value uint // the value of arg
}
type Args [][]Arg
var (
ErrUnresolvedLabel = errors.New("seccomp: unresolved label")
ErrDuplicateLabel = errors.New("seccomp: duplicate label use")
ErrUnsupportedOperation = errors.New("seccomp: unsupported operation for argument")
)
// Error returns an Action that will be used to send the calling
// process the specified errno when the syscall is made.
func Error(code syscall.Errno) Action {
return Action(code)
}
// New returns a new syscall context for use.
func New() *Context {
return &Context{
syscalls: make(map[uint32]*Syscall),
}
}
// Context holds syscalls for the current process to limit the type of
// actions the calling process can make.
type Context struct {
syscalls map[uint32]*Syscall
}
// Add will add the specified syscall, action, and arguments to the seccomp
// Context.
func (c *Context) Add(s *Syscall) {
c.syscalls[s.Value] = s
}
// Remove removes the specified syscall configuration from the Context.
func (c *Context) Remove(call uint32) {
delete(c.syscalls, call)
}
// Load will apply the Context to the calling process makeing any secccomp process changes
// apply after the context is loaded.
func (c *Context) Load() error {
filter, err := c.newFilter()
if err != nil {
return err
}
if err := prctl(prSetNoNewPrivileges, 1, 0, 0, 0); err != nil {
return err
}
prog := newSockFprog(filter)
return prog.set()
}
func (c *Context) newFilter() ([]sockFilter, error) {
var (
labels bpfLabels
f = newFilter()
)
for _, s := range c.syscalls {
f.addSyscall(s, &labels)
}
f.allow()
// process args for the syscalls
for _, s := range c.syscalls {
if err := f.addArguments(s, &labels); err != nil {
return nil, err
}
}
// apply labels for arguments
idx := int32(len(*f) - 1)
for ; idx >= 0; idx-- {
lf := &(*f)[idx]
if lf.code != (syscall.BPF_JMP + syscall.BPF_JA) {
continue
}
rel := int32(lf.jt)<<8 | int32(lf.jf)
if ((jumpJT << 8) | jumpJF) == rel {
if labels[lf.k].location == 0xffffffff {
return nil, ErrUnresolvedLabel
}
lf.k = labels[lf.k].location - uint32(idx+1)
lf.jt = 0
lf.jf = 0
} else if ((labelJT << 8) | labelJF) == rel {
if labels[lf.k].location != 0xffffffff {
return nil, ErrDuplicateLabel
}
labels[lf.k].location = uint32(idx)
lf.k = 0
lf.jt = 0
lf.jf = 0
}
}
return *f, nil
}

View File

@@ -0,0 +1,116 @@
package seccomp
import (
"fmt"
"syscall"
"unsafe"
)
type sockFilter struct {
code uint16
jt uint8
jf uint8
k uint32
}
func newFilter() *filter {
var f filter
f = append(f, sockFilter{
pfLD + syscall.BPF_W + syscall.BPF_ABS,
0,
0,
uint32(unsafe.Offsetof(secData.nr)),
})
return &f
}
type filter []sockFilter
func (f *filter) addSyscall(s *Syscall, labels *bpfLabels) {
if len(s.Args) == 0 {
f.call(s.Value, scmpBpfStmt(syscall.BPF_RET+syscall.BPF_K, s.scmpAction()))
} else {
if len(s.Args[0]) > 0 {
lb := fmt.Sprintf(labelTemplate, s.Value, s.Args[0][0].Index)
f.call(s.Value,
scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JA, labelIndex(labels, lb),
jumpJT, jumpJF))
}
}
}
func (f *filter) addArguments(s *Syscall, labels *bpfLabels) error {
for i := 0; len(s.Args) > i; i++ {
if len(s.Args[i]) > 0 {
lb := fmt.Sprintf(labelTemplate, s.Value, s.Args[i][0].Index)
f.label(labels, lb)
f.arg(s.Args[i][0].Index)
}
for j := 0; j < len(s.Args[i]); j++ {
var jf sockFilter
if len(s.Args)-1 > i && len(s.Args[i+1]) > 0 {
lbj := fmt.Sprintf(labelTemplate, s.Value, s.Args[i+1][0].Index)
jf = scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JA,
labelIndex(labels, lbj), jumpJT, jumpJF)
} else {
jf = scmpBpfStmt(syscall.BPF_RET+syscall.BPF_K, s.scmpAction())
}
if err := f.op(s.Args[i][j].Op, s.Args[i][j].Value, jf); err != nil {
return err
}
}
f.allow()
}
return nil
}
func (f *filter) label(labels *bpfLabels, lb string) {
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JA, labelIndex(labels, lb), labelJT, labelJF))
}
func (f *filter) call(nr uint32, jt sockFilter) {
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, nr, 0, 1))
*f = append(*f, jt)
}
func (f *filter) allow() {
*f = append(*f, scmpBpfStmt(syscall.BPF_RET+syscall.BPF_K, retAllow))
}
func (f *filter) deny() {
*f = append(*f, scmpBpfStmt(syscall.BPF_RET+syscall.BPF_K, retTrap))
}
func (f *filter) arg(index uint32) {
arg(f, index)
}
func (f *filter) op(operation Operator, v uint, jf sockFilter) error {
switch operation {
case EqualTo:
jumpEqualTo(f, v, jf)
case NotEqualTo:
jumpNotEqualTo(f, v, jf)
case GreatherThan:
jumpGreaterThan(f, v, jf)
case LessThan:
jumpLessThan(f, v, jf)
case MaskEqualTo:
jumpMaskEqualTo(f, v, jf)
default:
return ErrUnsupportedOperation
}
return nil
}
func arg(f *filter, idx uint32) {
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_W+syscall.BPF_ABS, endian.low(idx)))
*f = append(*f, scmpBpfStmt(syscall.BPF_ST, 0))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_W+syscall.BPF_ABS, endian.hi(idx)))
*f = append(*f, scmpBpfStmt(syscall.BPF_ST, 1))
}
func jump(f *filter, labels *bpfLabels, lb string) {
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JA, labelIndex(labels, lb),
jumpJT, jumpJF))
}

View File

@@ -0,0 +1,68 @@
// +build linux,amd64
package seccomp
// Using BPF filters
//
// ref: http://www.gsp.com/cgi-bin/man.cgi?topic=bpf
import "syscall"
func jumpGreaterThan(f *filter, v uint, jt sockFilter) {
lo := uint32(uint64(v) % 0x100000000)
hi := uint32(uint64(v) / 0x100000000)
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JGT+syscall.BPF_K, (hi), 4, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, (hi), 0, 5))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JGE+syscall.BPF_K, (lo), 0, 2))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
*f = append(*f, jt)
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
}
func jumpEqualTo(f *filter, v uint, jt sockFilter) {
lo := uint32(uint64(v) % 0x100000000)
hi := uint32(uint64(v) / 0x100000000)
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, (hi), 0, 5))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, (lo), 0, 2))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
*f = append(*f, jt)
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
}
func jumpLessThan(f *filter, v uint, jt sockFilter) {
lo := uint32(uint64(v) % 0x100000000)
hi := uint32(uint64(v) / 0x100000000)
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JGT+syscall.BPF_K, (hi), 6, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, (hi), 0, 3))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JGT+syscall.BPF_K, (lo), 2, 0))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
*f = append(*f, jt)
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
}
func jumpNotEqualTo(f *filter, v uint, jt sockFilter) {
lo := uint32(uint64(v) % 0x100000000)
hi := uint32(uint64(v) / 0x100000000)
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, hi, 5, 0))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 0))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, lo, 2, 0))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
*f = append(*f, jt)
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
}
// this checks for a value inside a mask. The evalusation is equal to doing
// CLONE_NEWUSER & syscallMask == CLONE_NEWUSER
func jumpMaskEqualTo(f *filter, v uint, jt sockFilter) {
lo := uint32(uint64(v) % 0x100000000)
hi := uint32(uint64(v) / 0x100000000)
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, hi, 0, 6))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 0))
*f = append(*f, scmpBpfStmt(syscall.BPF_ALU+syscall.BPF_AND, uint32(v)))
*f = append(*f, scmpBpfJump(syscall.BPF_JMP+syscall.BPF_JEQ+syscall.BPF_K, lo, 0, 2))
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
*f = append(*f, jt)
*f = append(*f, scmpBpfStmt(syscall.BPF_LD+syscall.BPF_MEM, 1))
}

View File

@@ -0,0 +1,122 @@
// Package seccomp provides native seccomp ( https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt ) support for go.
package seccomp
import (
"syscall"
"unsafe"
)
// Operator that is used for argument comparison.
type Operator int
const (
EqualTo Operator = iota
NotEqualTo
GreatherThan
LessThan
MaskEqualTo
)
const (
jumpJT = 0xff
jumpJF = 0xff
labelJT = 0xfe
labelJF = 0xfe
)
const (
pfLD = 0x0
retKill = 0x00000000
retTrap = 0x00030000
retAllow = 0x7fff0000
modeFilter = 0x2
prSetNoNewPrivileges = 0x26
)
func actionErrno(errno uint32) uint32 {
return 0x00050000 | (errno & 0x0000ffff)
}
var (
secData = struct {
nr int32
arch uint32
insPointer uint64
args [6]uint64
}{0, 0, 0, [6]uint64{0, 0, 0, 0, 0, 0}}
)
var isLittle = func() bool {
var (
x = 0x1234
p = unsafe.Pointer(&x)
p2 = (*[unsafe.Sizeof(0)]byte)(p)
)
if p2[0] == 0 {
return false
}
return true
}()
var endian endianSupport
type endianSupport struct {
}
func (e endianSupport) hi(i uint32) uint32 {
if isLittle {
return e.little(i)
}
return e.big(i)
}
func (e endianSupport) low(i uint32) uint32 {
if isLittle {
return e.big(i)
}
return e.little(i)
}
func (endianSupport) big(idx uint32) uint32 {
if idx >= 6 {
return 0
}
return uint32(unsafe.Offsetof(secData.args)) + 8*idx
}
func (endianSupport) little(idx uint32) uint32 {
if idx < 0 || idx >= 6 {
return 0
}
return uint32(unsafe.Offsetof(secData.args)) +
uint32(unsafe.Alignof(secData.args[0]))*idx + uint32(unsafe.Sizeof(secData.arch))
}
func prctl(option int, arg2, arg3, arg4, arg5 uintptr) error {
_, _, err := syscall.Syscall6(syscall.SYS_PRCTL, uintptr(option), arg2, arg3, arg4, arg5, 0)
if err != 0 {
return err
}
return nil
}
func newSockFprog(filter []sockFilter) *sockFprog {
return &sockFprog{
len: uint16(len(filter)),
filt: filter,
}
}
type sockFprog struct {
len uint16
filt []sockFilter
}
func (s *sockFprog) set() error {
_, _, err := syscall.Syscall(syscall.SYS_PRCTL, uintptr(syscall.PR_SET_SECCOMP),
uintptr(modeFilter), uintptr(unsafe.Pointer(s)))
if err != 0 {
return err
}
return nil
}

View File

@@ -9,6 +9,9 @@ import (
// NewFrame returns a new stack frame for the provided information
func NewFrame(pc uintptr, file string, line int) Frame {
fn := runtime.FuncForPC(pc)
if fn == nil {
return Frame{}
}
pack, name := parseFunctionName(fn.Name())
return Frame{
Line: line,

View File

@@ -99,5 +99,8 @@ func (l *linuxStandardInit) Init() error {
if syscall.Getppid() != l.parentPid {
return syscall.Kill(syscall.Getpid(), syscall.SIGKILL)
}
if err := finalizeSeccomp(l.config); err != nil {
return err
}
return system.Execv(l.config.Args[0], l.config.Args[0:], os.Environ())
}

View File

@@ -1,12 +1,5 @@
package libcontainer
import "github.com/docker/libcontainer/cgroups"
type Stats struct {
Interfaces []*NetworkInterface
CgroupStats *cgroups.Stats
}
type NetworkInterface struct {
// Name is the name of the network interface.
Name string

View File

@@ -0,0 +1,5 @@
package libcontainer
type Stats struct {
Interfaces []*NetworkInterface
}

View File

@@ -0,0 +1,8 @@
package libcontainer
import "github.com/docker/libcontainer/cgroups"
type Stats struct {
Interfaces []*NetworkInterface
CgroupStats *cgroups.Stats
}

View File

@@ -0,0 +1,5 @@
package libcontainer
type Stats struct {
Interfaces []*NetworkInterface
}

View File

@@ -21,16 +21,20 @@ var setNsMap = map[string]uintptr{
"linux/s390x": 339,
}
var sysSetns = setNsMap[fmt.Sprintf("%s/%s", runtime.GOOS, runtime.GOARCH)]
func SysSetns() uint32 {
return uint32(sysSetns)
}
func Setns(fd uintptr, flags uintptr) error {
ns, exists := setNsMap[fmt.Sprintf("%s/%s", runtime.GOOS, runtime.GOARCH)]
if !exists {
return fmt.Errorf("unsupported platform %s/%s", runtime.GOOS, runtime.GOARCH)
}
_, _, err := syscall.RawSyscall(ns, fd, flags, 0)
if err != 0 {
return err
}
return nil
}

View File

@@ -1,4 +1,4 @@
// +build cgo
// +build cgo,linux cgo,freebsd
package system

View File

@@ -1,8 +1,15 @@
// +build !cgo
// +build !cgo windows
package system
func GetClockTicks() int {
// TODO figure out a better alternative for platforms where we're missing cgo
//
// TODO Windows. This could be implemented using Win32 QueryPerformanceFrequency().
// https://msdn.microsoft.com/en-us/library/windows/desktop/ms644905(v=vs.85).aspx
//
// An example of its usage can be found here.
// https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
return 100
}

View File

@@ -45,5 +45,6 @@ clone git github.com/coreos/go-systemd v2
clone git github.com/godbus/dbus v2
clone git github.com/Sirupsen/logrus v0.7.3
clone git github.com/syndtr/gocapability 8e4cdcb
clone git github.com/golang/protobuf 655cdfa588ea
# intentionally not vendoring Docker itself... that'd be a circle :)

View File

@@ -348,3 +348,60 @@ func GetExecUser(userSpec string, defaults *ExecUser, passwd, group io.Reader) (
return user, nil
}
// GetAdditionalGroupsPath looks up a list of groups by name or group id
// against the group file. If a group name cannot be found, an error will be
// returned. If a group id cannot be found, it will be returned as-is.
func GetAdditionalGroupsPath(additionalGroups []string, groupPath string) ([]int, error) {
groupReader, err := os.Open(groupPath)
if err != nil {
return nil, fmt.Errorf("Failed to open group file: %v", err)
}
defer groupReader.Close()
groups, err := ParseGroupFilter(groupReader, func(g Group) bool {
for _, ag := range additionalGroups {
if g.Name == ag || strconv.Itoa(g.Gid) == ag {
return true
}
}
return false
})
if err != nil {
return nil, fmt.Errorf("Unable to find additional groups %v: %v", additionalGroups, err)
}
gidMap := make(map[int]struct{})
for _, ag := range additionalGroups {
var found bool
for _, g := range groups {
// if we found a matched group either by name or gid, take the
// first matched as correct
if g.Name == ag || strconv.Itoa(g.Gid) == ag {
if _, ok := gidMap[g.Gid]; !ok {
gidMap[g.Gid] = struct{}{}
found = true
break
}
}
}
// we asked for a group but didn't find it. let's check to see
// if we wanted a numeric group
if !found {
gid, err := strconv.Atoi(ag)
if err != nil {
return nil, fmt.Errorf("Unable to find group %s", ag)
}
// Ensure gid is inside gid range.
if gid < minId || gid > maxId {
return nil, ErrRange
}
gidMap[gid] = struct{}{}
}
}
gids := []int{}
for gid := range gidMap {
gids = append(gids, gid)
}
return gids, nil
}

View File

@@ -21,6 +21,9 @@ func GenerateRandomName(prefix string, size int) (string, error) {
if _, err := io.ReadFull(rand.Reader, id); err != nil {
return "", err
}
if size > 64 {
size = 64
}
return prefix + hex.EncodeToString(id)[:size], nil
}

View File

@@ -0,0 +1,17 @@
.DS_Store
*.[568ao]
*.pb.go
*.ao
*.so
*.pyc
._*
.nfs.*
[568a].out
*~
*.orig
core
_obj
_test
_testmain.go
compiler/protoc-gen-go
compiler/testdata/extension_test

View File

@@ -0,0 +1,3 @@
# This source code refers to The Go Authors for copyright purposes.
# The master list of authors is in the main Go distribution,
# visible at http://tip.golang.org/AUTHORS.

View File

@@ -0,0 +1,3 @@
# This source code was written by the Go contributors.
# The master list of contributors is in the main Go distribution,
# visible at http://tip.golang.org/CONTRIBUTORS.

View File

@@ -0,0 +1,31 @@
Go support for Protocol Buffers - Google's data interchange format
Copyright 2010 The Go Authors. All rights reserved.
https://github.com/golang/protobuf
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,40 @@
# Go support for Protocol Buffers - Google's data interchange format
#
# Copyright 2010 The Go Authors. All rights reserved.
# https://github.com/golang/protobuf
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following disclaimer
# in the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Google Inc. nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Includable Makefile to add a rule for generating .pb.go files from .proto files
# (Google protocol buffer descriptions).
# Typical use if myproto.proto is a file in package mypackage in this directory:
#
# include $(GOROOT)/src/pkg/github.com/golang/protobuf/Make.protobuf
%.pb.go: %.proto
protoc --go_out=. $<

Some files were not shown because too many files have changed in this diff Show More