Commit Graph

138 Commits

Author SHA1 Message Date
Michael Crosby
f9d231f660 Merge pull request #4493 from thaJeztah/seccomp_uring
seccomp: allow io-uring related system calls
2020-08-25 11:39:45 -04:00
Michael Crosby
396b863138 Merge pull request #4491 from thaJeztah/seccomp_syslog
seccomp: move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG
2020-08-25 11:35:28 -04:00
Sebastiaan van Stijn
325bac7c71 seccomp: allow io-uring related system calls
Adds the io-uring related system call introduced in kernel 5.1 to the
seccomp whitelist. With older kernels or older versions of libseccomp,
this configure will be omitted.

Note that io_uring will grow support for more syscalls in the future
so we should keep an eye on this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:59:53 +02:00
Sebastiaan van Stijn
0a5ee7e6f3 seccomp: allow clock_settime when CAP_SYS_TIME is added
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:43:21 +02:00
Sebastiaan van Stijn
5cdb6e81d2 seccomp: allow quotactl with CAP_SYS_ADMIN
This allows the quotactl syscall in the default seccomp profile, gated by
CAP_SYS_ADMIN.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:40:43 +02:00
Sebastiaan van Stijn
5862285fac seccomp: allow sync_file_range2 on supported architectures.
On a ppc64le host, running postgres (tried with 9.4 to 9.6) gives the following
warning when trying to flush data to disks (which happens very frequently):

     WARNING: could not flush dirty data: Operation not permitted.

A quick dig in postgres source code indicate it uses sync_file_range(2) to
flush data; which on ppe64le and arm64 is translated to sync_file_range2(2)
for alignements reasons.

The profile did not allow sync_file_range2(2), making postgres sad because
it can not flush its buffers. arm_sync_file_range(2) is an ancient alias to
sync_file_range2(2), the syscall was renamed in Linux 2.6.22 when the same
syscall was added for PowerPC.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:36:53 +02:00
Sebastiaan van Stijn
117d678749 seccomp: allow personality with UNAME26 bit set
From personality(2):

    Have uname(2) report a 2.6.40+ version number rather than a 3.x version
    number.  Added as a stopgap measure to support broken applications that
    could not handle the  kernel  version-numbering  switch  from 2.6.x to 3.x.

This allows both "UNAME26|PER_LINUX" and "UNAME26|PER_LINUX32".

Fixes: "setarch broken in docker packages from Debian stretch"

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:27:14 +02:00
Sebastiaan van Stijn
fc9e5d161a seccomp: allow syscall membarrier
Add the membarrier syscall to the default seccomp profile.
It is for example used in the implementation of dlopen() in
the musl libc of Alpine images.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:26 +02:00
Sebastiaan van Stijn
1746a195e9 seccomp: allow adjtimex get time operation
Enabled adjtimex in the default profile without requiring CAP_SYS_TIME privilege.
The kernel will check CAP_SYS_TIME and won't allow setting the time.

Fixes: Getting the system time with ntptime returns an error in an unprivileged
container

To verify, inside a CentOS 7 container:

    yum install -y ntp
    ntptime
    # ntp_gettime() returns code 0 (OK)

    ntpdate -v time.nist.gov
    # ntpdate[84]: Can't adjust the time of day: Operation not permitted

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:23 +02:00
Sebastiaan van Stijn
7e7545e556 seccomp: allow add preadv2 and pwritev2 syscalls
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 12:16:21 +02:00
Sebastiaan van Stijn
267a0cf68e seccomp: move the syslog syscall to be gated by CAP_SYS_ADMIN or CAP_SYSLOG
This call is what is used to implement `dmesg` to get kernel messages
about the host. This can leak substantial information about the host.
It is normally available to unprivileged users on the host, unless
the sysctl `kernel.dmesg_restrict = 1` is set, but this is not set
by standard on the majority of distributions. Blocking this to restrict
leaks about the configuration seems correct.

Relates to docker/docker#37897 "docker exposes dmesg to containers by default"

See also https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-08-24 11:57:48 +02:00
Jintao Zhang
6a915a1453 seccomp: add faccessat2 syscall.
related to https://patchwork.kernel.org/patch/11545287/

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
2020-08-17 21:48:21 +08:00
Jintao Zhang
e28e55f455 seccomp: add openat2 syscall.
related to https://patchwork.kernel.org/patch/11167585/

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
2020-08-16 16:28:21 +08:00
Lantao Liu
ccda537604 Create etcd user in cloud init.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Mike Brown
e973719ccf use containerd/project header test
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
cb7ffd4b0b Fix indent in cni.template.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
3e03ba7aa2 Update deployment and integration test
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
9c54517920 Add TaskMax=infinity
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
523b0b3c61 Remove noSnat
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
231d291b2d Use v2 config.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
1e1688d211 Use per-pod shim.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:11 -07:00
Lantao Liu
87bd84a7bb Add DefaultRuntimeName option.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
cfab98a5fd Use ctr images import.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
5e3ac16cc6 Add cri as required plugin.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
5b9d8476ea Use runc.v1 for now for debugging.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
e6e272e740 Enable runc.v2 as the default runtime in test.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
d19aa0fd2e Use local env to avoid writing to passed-in readonly env.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
ee6d69bbc1 Set default "" to extra runtime handler.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Tim Allclair
474c79bd52 Expose vars to configure an additional runtime handler
Expose environment variables in the GCE containerd configuration
script for configuring an additional runtime handler. This unblocks
E2E testing of custom runtime handlers.

Signed-off-by: Tim Allclair <tallclair@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
ce12477f47 Support docker 18.09 in the test script.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:10 -07:00
Lantao Liu
1467121010 Remove the unused health-monitor.sh.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
201ad4d3c4 Support netd in GCE bootstrap.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
5ce7057502 Serve streaming on localhost by default to match k8s 1.11 default.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
b553fdaf31 Remove crictl on GCE for all cases.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
d8ce08fd92 Set stream server to serve on localhost on GCE.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
1629bab7f9 Make max container log line size configurable through cloud init.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
042378dcf1 Disable TLS streaming to work with new kubelet streaming proxy.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:09 -07:00
Bingshen Wang
37f2ecad97 Update cni.template
Format the cni.template, use `space` instead of some `tab`. Avoid indent issue in text editor.

Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
2020-08-11 09:15:09 -07:00
Lantao Liu
b58b6fef86 Disable restart plugin on GCE.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
f938a166cd Fix kube-container-runtime-monitor.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
91f8e61bd3 Use crictl installed in kube-up.sh
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
5161f663e4 Add unix:// prefix for socket addresses used by CRI remote client.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
1b995fcaf2 Add KUBE_CONTAINER_RUNTIME_NAME to fix fluentd support.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
48457a254e Try using preloaded containerd if no version is specified.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
c67a38b0b5 Add log level support.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
4453aac005 Improve gce bootstrapping in various ways.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:08 -07:00
Lantao Liu
1bd3cdc572 Add cni config template support.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:07 -07:00
Lantao Liu
d520fac508 Enable TLS streaming in all the setup.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:07 -07:00
Lantao Liu
cdb4aec93a Use systemd service cgroup and oom score adj.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:07 -07:00
Lantao Liu
af8bd80689 Fix for kube-up.sh and update several documments.
Signed-off-by: Lantao Liu <lantaol@google.com>
2020-08-11 09:15:07 -07:00