kubernetes/vendor/github.com/opencontainers/runc/libcontainer
Filipe Brandenburger 54d93820ee Update libcontainer to include PRs with fixes to systemd cgroup driver
PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that
makes Kubelet startup hang when using systemd cgroup driver (by adding a
timeout) and further PR opencontainers/runc#1772 fixes that bug by
checking the proper error status before waiting on the channel.

PR opencontainers/runc#1776 checks whether Delegate works in slices,
which keeps libcontainer systemd cgroup driver working on systemd v237+.

PR opencontainers/runc#1781 makes the channel buffered, so if we time
out waiting on the channel, the updater will not block trying to it
since there are no longer any consumers.
2018-04-24 13:29:13 -07:00
..
apparmor Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
cgroups Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
configs Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
criurpc Update openapi to use kube-openapi code 2017-10-20 09:21:06 -07:00
intelrdt Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
keys Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
mount Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
seccomp Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
stacktrace update BUILD files 2017-10-15 18:18:13 -07:00
system Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
user Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
utils Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
BUILD Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
capabilities_linux.go bump(github.com/opencontainers/runc): 595bea022f077a9e17d7473b34fbaf1adaed9e43 2018-02-20 14:07:00 -05:00
console_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
container_linux.go Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
container.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
criu_opts_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
error.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
factory_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
factory.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
generic_error.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
init_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
message_linux.go Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
network_linux.go
notify_linux.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
process_linux.go bump(github.com/opencontainers/runc): 595bea022f077a9e17d7473b34fbaf1adaed9e43 2018-02-20 14:07:00 -05:00
process.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
README.md update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
restored_process.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00
rootfs_linux.go Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
setns_init_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
SPEC.md Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
standard_init_linux.go Update libcontainer to include PRs with fixes to systemd cgroup driver 2018-04-24 13:29:13 -07:00
state_linux.go bump(github.com/opencontainers/runc): 595bea022f077a9e17d7473b34fbaf1adaed9e43 2018-02-20 14:07:00 -05:00
stats_linux.go Bump runc to d5b4a3e 2018-01-12 14:22:50 -05:00
stats.go
sync.go update cadvisor, docker, and runc godeps 2017-09-05 12:38:57 -07:00

libcontainer

GoDoc

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

import (
	_ "github.com/opencontainers/runc/libcontainer/nsenter"
)

func init() {
	if len(os.Args) > 1 && os.Args[1] == "init" {
		runtime.GOMAXPROCS(1)
		runtime.LockOSThread()
		factory, _ := libcontainer.New("")
		if err := factory.StartInitialization(); err != nil {
			logrus.Fatal(err)
		}
		panic("--this line should have never been executed, congratulations--")
	}
}

Then to create a container you first have to initialize an instance of a factory that will handle the creation and initialization for a container.

factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
	logrus.Fatal(err)
	return
}

Once you have an instance of the factory created we can create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
                Bounding: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Effective: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Inheritable: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Permitted: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Ambient: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
        },
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			AllowAllDevices:  nil,
			AllowedDevices:   configs.DefaultAllowedDevices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  configs.DefaultAutoCreatedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container:

container, err := factory.Create("container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This let's you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Code and documentation copyright 2014 Docker, inc. Code released under the Apache 2.0 license. Docs released under Creative commons.