containerd/oci
Wei Fu 813a061fe1 oci: use readonly mount to read user/group info
In linux kernel, the umount writable-mountpoint will try to do sync-fs
to make sure that the dirty pages to the underlying filesystems. The many
number of umount actions in the same time maybe introduce performance
issue in IOPS limited disk.

When CRI-plugin creates container, it will temp-mount rootfs to read
that UID/GID info for entrypoint. Basically, the rootfs is writable
snapshotter and then after read, umount will invoke sync-fs action.

For example, using overlayfs on ext4 and use bcc-tools to monitor
ext4_sync_fs call.

```
// uname -a
Linux chaofan 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

// open terminal 1
kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod

// open terminal 2
/usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P

  ext4_sync_fs
  sync_filesystem
  ovl_sync_fs
  __sync_filesystem
  sync_filesystem
  generic_shutdown_super
  kill_anon_super
  deactivate_locked_super
  deactivate_super
  cleanup_mnt
  __cleanup_mnt
  task_work_run
  exit_to_user_mode_prepare
  syscall_exit_to_user_mode
  do_syscall_64
  entry_SYSCALL_64_after_hwframe
  syscall.Syscall.abi0
  github.com/containerd/containerd/mount.unmount
  github.com/containerd/containerd/mount.UnmountAll
  github.com/containerd/containerd/mount.WithTempMount.func2
  github.com/containerd/containerd/mount.WithTempMount
  github.com/containerd/containerd/oci.WithUserID.func1
  github.com/containerd/containerd/oci.WithUser.func1
  github.com/containerd/containerd/oci.ApplyOpts
  github.com/containerd/containerd.WithSpec.func1
  github.com/containerd/containerd.(*Client).NewContainer
  github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
  github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
  github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
  google.golang.org/grpc.(*Server).processUnaryRPC
  google.golang.org/grpc.(*Server).handleStream
  google.golang.org/grpc.(*Server).serveStreams.func1.2
  runtime.goexit.abi0
    containerd [34771]
    1
```

If there are comming several create requestes, umount actions might
bring high IO pressure on the /var/lib/containerd's underlying disk.

After checkout the kernel code[1], the kernel will not call
__sync_filesystem if the mount is readonly. Based on this, containerd
should use readonly mount to get UID/GID information.

Reference:

* [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61

Closes: #4604

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-01-28 23:36:04 +08:00
..
client.go Licence header added 2018-02-19 10:32:26 +09:00
mounts_freebsd.go Add ruleset=4 option 2021-05-25 09:17:16 +02:00
mounts.go Run go fmt with Go 1.17 2021-08-22 09:31:50 +09:00
spec_opts_linux_test.go refactor: move from io/ioutil to io and os package 2021-09-21 09:50:38 +08:00
spec_opts_linux.go oci: implement WithRdt 2022-01-04 09:27:54 +02:00
spec_opts_nonlinux.go feat: replace github.com/pkg/errors to errors 2022-01-07 10:27:03 +08:00
spec_opts_test.go refactor: move from io/ioutil to io and os package 2021-09-21 09:50:38 +08:00
spec_opts_unix_test.go Run go fmt with Go 1.17 2021-08-22 09:31:50 +09:00
spec_opts_unix.go cri: Devices ownership from SecurityContext 2021-08-30 09:30:00 +03:00
spec_opts_windows_test.go Remove redundant build tags 2021-08-05 22:27:46 -07:00
spec_opts_windows.go feat: replace github.com/pkg/errors to errors 2022-01-07 10:27:03 +08:00
spec_opts.go oci: use readonly mount to read user/group info 2022-01-28 23:36:04 +08:00
spec_test.go oci.WithPrivileged: set the current caps, not the known caps 2021-02-10 17:14:17 +09:00
spec.go Fix mounts for FreeBSD 2021-05-10 21:49:46 +02:00
utils_unix_go116_test.go OCI: Mount (accessible) host devices in privileged rootless containers 2021-12-10 12:16:59 +01:00
utils_unix_go117_test.go OCI: Mount (accessible) host devices in privileged rootless containers 2021-12-10 12:16:59 +01:00
utils_unix_test.go OCI: Mount (accessible) host devices in privileged rootless containers 2021-12-10 12:16:59 +01:00
utils_unix.go feat: replace github.com/pkg/errors to errors 2022-01-07 10:27:03 +08:00