Commit Graph

128 Commits

Author SHA1 Message Date
Wei Fu
cb5a48e645 *: enable ARM64 runner
There are many Kubernetes clusters running on ARM64. Enable ARM64 runner
is to commit to support ARM64 platform officially.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-07 23:55:36 +08:00
Sebastiaan van Stijn
2af6db672e
switch back from golang.org/x/sys/execabs to os/exec (go1.19)
This is effectively a revert of 2ac9968401, which
switched from os/exec to the golang.org/x/sys/execabs package to mitigate
security issues (mainly on Windows) with lookups resolving to binaries in the
current directory.

from the go1.19 release notes https://go.dev/doc/go1.19#os-exec-path

> ## PATH lookups
>
> Command and LookPath no longer allow results from a PATH search to be found
> relative to the current directory. This removes a common source of security
> problems but may also break existing programs that depend on using, say,
> exec.Command("prog") to run a binary named prog (or, on Windows, prog.exe) in
> the current directory. See the os/exec package documentation for information
> about how best to update such programs.
>
> On Windows, Command and LookPath now respect the NoDefaultCurrentDirectoryInExePath
> environment variable, making it possible to disable the default implicit search
> of “.” in PATH lookups on Windows systems.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-11-02 21:15:40 +01:00
Phil Estes
3d6c5ea487
Merge pull request #9308 from ZhangShuaiyi/fix/TestRwLoop
test: remove /dev/loopX in TestRwLoop
2023-11-02 14:44:59 +00:00
Shuaiyi Zhang
b6adf43d4a test: use 'Autoclear: ture' in TestRwLoop and add Autoclear test
Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-11-01 11:49:12 +08:00
Derek McGowan
5fdf55e493
Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Derek McGowan
788f7f248a
Merge pull request #9218 from fuweid/followup-idmapped
idmapped: use pidfd to avoid pid reuse issue
2023-10-20 17:34:02 +00:00
Samuel Karp
423c7ad4fe
Merge pull request #9211 from UiPath/use-loop-configure 2023-10-16 23:40:58 -07:00
Alexandru Matei
a782fd6da2 Use LOOP_CONFIGURE when creating loop devices
LOOP_CONFIGURE is a new ioctl that is a lot faster than
the LOOP_SET_FD+LOOP_SET_STATUS64 calls

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-10-16 13:02:12 +03:00
Wei Fu
3742f7f0db idmapped: use pidfd to avoid pid reuse issue
It's followup for #5890.

The containerd-shim process depends on the mount package to init rootfs
for container. For the container enable user namespace, the mount
package needs to fork child process to get the brand-new user namespace.
However, there are two reapers in one process (described by the
following list) and there are race-condition cases.

1. mount package
2. sys.Reaper as global one which watch all the SIGCHLD.

=== [kill(2)][kill] the wrong process ===

Currently, we use pipe to ensure that child process is alive. However,
the pide file descriptor can be hold by other process, which the child
process cannot exit by self. We should use [kill(2)][kill] to ensure the
child process. But we might kill the wrong process if the child process
might be reaped by containerd-shim and the PID might be reused by other
process.

=== [waitid(2)][waitid] on the wrong child process ===

```
containerd-shim process:

Goroutine 1(GetUsernsFD):                   Goroutine 2(Reaper)

1. Ready to wait for child process X

                                            2. Received SIGCHLD from X

                                            3. Reaped the zombie child process X

                                            (X has been reused by other child process)

4. Wait on process X

The goroutine 1 will be stuck until the process X has been terminated.
```

=== open `/proc/X/ns/user` on the wrong child process ===

There is also pid-reused risk between opening `/proc/$pid/ns/user` and
writing `/proc/$pid/u[g]id_map`.

```
containerd-shim process:

Goroutine 1(GetUsernsFD):                   Goroutine 2(Reaper)

1. Fork child process X

2. Write /proc/X/uid_map,gid_map

                                            3. Received SIGCHLD from X

                                            4. Reaped the zombie child process X

                                            (X has been reused by other process)

5. Open /proc/X/ns/user file as usernsFD

The usernsFD links to the wrong X!!!
```

In order to fix the race-condition, we should use [CLONE_PIDFD][clone2] (Since
Linux v5.2).

When we fork child process `X`, the kernel will return a process file
descriptor `X_PIDFD` referencing to child process `X`. With the pidfd, we can
use [pidfd_send_signal(2)][pidfd_send_signal] (Since Linux v5.1)
to send signal(0) to ensure the child process `X` is alive. If the `X` has
terminated and its PID has been recycled for another process. The
pidfd_send_signal fails with the error ESRCH.

Therefore, we can open `/proc/X/{ns/user,uid_map,gid_map}` file
descriptors as first and then use pidfd_send_signal to check the process
is still alive. If so, we can ensure the file descriptors are valid and
reference to the child process `X`. Even if the `X` PID has been reused
after pidfd_send_signal call, the file descriptors are still valid.

```code
X, pidfd = clone2(CLONE_PIDFD)

usernsFD = open /proc/X/ns/user
uidmapFD = open /proc/X/uid_map
gidmapFD = open /proc/X/gid_map

pidfd_send_signal pidfd, signal(0)
  return err if no such process

== When we arrive here, we can ensure usernsFD/uidmapFD/gidmapFD are correct
== even if X has been reused after pidfd_send_signal call.

update uid/gid mapping by uidmapFD/gidmapFD
return usernsFD
```

And the [waitid(2)][waitid] also supports pidfd type (Since Linux 5.4).
We can use pidfd type waitid to ensure we are waiting for the correct
process. All the PID related race-condition issues can be resolved by
pidfd.

```bash
➜  mount git:(followup-idmapped) pwd
/home/fuwei/go/src/github.com/containerd/containerd/mount

➜  mount git:(followup-idmapped) sudo go test -test.root -run TestGetUsernsFD -count=1000 -failfast -p 100  ./...
PASS
ok      github.com/containerd/containerd/mount  3.446s
```

[kill]: <https://man7.org/linux/man-pages/man2/kill.2.html>
[clone2]: <https://man7.org/linux/man-pages/man2/clone.2.html>
[pidfd_send_signal]: <https://man7.org/linux/man-pages/man2/pidfd_send_signal.2.html>
[waitid]: <https://man7.org/linux/man-pages/man2/waitid.2.html>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-10-13 00:56:55 +08:00
Akihiro Suda
9ffb34ac49
Merge pull request #9054 from macOScontainers/canonicalize-filter-mount-path
Fix usages of `mountinfo.PrefixFilter`
2023-09-27 05:10:27 +09:00
Derek McGowan
508aa3a1ef
Move to use github.com/containerd/log
Add github.com/containerd/log to go.mod

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Marat Radchenko
d94a789d15 Fix usages of mountinfo.PrefixFilter
It says: The prefix path **must be absolute, have all symlinks resolved, and cleaned**. But those requirements are violated in lots of places.

What happens when it is given a non-canonicalized path is that `mountinfo.GetMounts` will not find mounts.

The trivial case is:
```
$ mkdir a && ln -s a b && mkdir b/c b/d && mount --bind b/c b/d && cat /proc/mounts | grep -- '[ab]/d'
/dev/sdd3 /home/user/a/d ext4 rw,noatime,discard 0 0
```
We asked to bind-mount b/c to b/d, but ended up with mount in a/d.
So, mount table always contains canonicalized mount points, and it is an error to look for non-canonicalized paths in it.

Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
2023-09-10 15:14:26 +03:00
Ilya Hanov
1555a31bf6 mount: support idmapped mount points
This patch introduces idmapped mounts support for
container rootfs.

The idmapped mounts support was merged in Linux kernel 5.12
torvalds/linux@7d6beb7.
This functionality allows to address chown overhead for containers that
use user namespace.

The changes are based on experimental patchset published by
Mauricio Vásquez #4734.
Current version reiplements support of idmapped mounts using Golang.

Performance measurement results:
Image           idmapped mount  recursive chown
BusyBox         00.135          04.964
Ubuntu          00.171          15.713
Fedora          00.143          38.799

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Artem Kuzin <artem.kuzin@huawei.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Signed-off-by: Ilya Hanov <ilya.hanov@huawei-partners.com>
2023-09-05 01:23:30 +03:00
Akihiro Suda
98f27e1d9c
Revert "Add support for mounts on Darwin"
This reverts commit 2799b28e61.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-07-19 00:22:20 +09:00
Marat Radchenko
2799b28e61 Add support for mounts on Darwin
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
2023-07-17 23:27:04 +03:00
Danny Canter
7ef133ad47 Fix mount pkg typo
retired -> retried

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-07-10 01:45:17 -07:00
Danny Canter
55a8102ec1 mount: Add From/ToProto helpers
Helpers to convert from containerd's [Mount] to its protobuf structure for
[Mount] and vice-versa appear three times. It seems sane to just expose
this facility in /mount.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-06-28 04:03:18 -07:00
Wei Fu
72b7d16505 mount: support direct-io for loopback device
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-06-15 23:51:46 +08:00
Craig Ingram
d2605de734 add handling of a '.' commondir and bounds checking to mount_linux
Signed-off-by: Craig Ingram <Cjingram@google.com>
2023-05-30 21:13:16 +00:00
Cardy.Tang
a6cd5e3f4f bugfix: resolve symlink when looking up mountpoint
Signed-off-by: Cardy.Tang <zuniorone@gmail.com>
2023-05-22 11:03:51 +08:00
Gabriel Adrian Samfira
c9e5c33a18 UnmountAll is a no-op for missing mount points
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-04 12:59:52 -07:00
Gabriel Adrian Samfira
8538e7a2ac Improve error messages and remove check
* Improve error messages
  * remove a check for the existance of unmount target. We probably
    should not mask that the target was missing.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-04 12:07:34 -07:00
Gabriel Adrian Samfira
ba74cdf150 Make ReadOnly() available on all platforms
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-04 02:04:56 -07:00
Gabriel Adrian Samfira
1279ad880c Remove bind code path in mount()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-03 23:18:44 -07:00
Gabriel Adrian Samfira
6a5b4c9c24 Remove "bind" code path from diff
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-03 08:11:35 -07:00
Gabriel Adrian Samfira
d373ebc4de Properly mount base layers
As opposed to a writable layer derived from a base layer, the volume
path of a base layer, once activated and prepared will not be a WCIFS
volume, but the actual path on disk to the snapshot. We cannot directly
mount this folder, as that would mean a client may gain access and
potentially damage important metadata files that would render the layer
unusabble.

For base layers we need to mount the Files folder which must exist in
any valid base windows-layer.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-02 08:35:34 -07:00
Gabriel Adrian Samfira
7f82dd91f4 Add ReadOnly() function
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-04-01 08:43:14 -07:00
Gabriel Adrian Samfira
4012c1b853 Remove escalated privileges
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:17:35 -07:00
Gabriel Adrian Samfira
95687a9324 Fix go.mod, simplify boolean logic, add logging
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:16:56 -07:00
Gabriel Adrian Samfira
7a36efd75e Ignore ERROR_NOT_FOUND error when removing mount
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:16:55 -07:00
Gabriel Adrian Samfira
db32798592 Update continuity, go-winio and hcsshim
Update dependencies and remove the local bindfilter files. Those have
been moved to go-winio.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:16:52 -07:00
Gabriel Adrian Samfira
00efd3e6d8 Remove unused function
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:15:19 -07:00
Gabriel Adrian Samfira
36dc2782c4 Use bind filer for mounts
The bind filter supports bind-like mounts and volume mounts. It also
allows us to have read-only mounts.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 06:15:18 -07:00
Paul "TBBle" Hampson
474a257b16 Implement Windows mounting for bind and windows-layer mounts
Using symlinks for bind mounts means we are not protecting an RO-mounted
layer against modification. Windows doesn't currently appear to offer a
better approach though, as we cannot create arbitrary empty WCOW scratch
layers at this time.

For windows-layer mounts, Unmount does not have access to the mounts
used to create it. So we store the relevant data in an Alternate Data
Stream on the mountpoint in order to be able to Unmount later.

Based on approach in https://github.com/containerd/containerd/pull/2366,
with sign-offs recorded as 'Based-on-work-by' trailers below.

This also partially-reverts some changes made in #6034 as they are not
needed with this mounting implmentation, which no longer needs to be
handled specially by the caller compared to non-Windows mounts.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Based-on-work-by: Michael Crosby <crosbymichael@gmail.com>
Based-on-work-by: Darren Stahl <darst@microsoft.com>
2023-03-31 06:15:17 -07:00
Laura Brehm
daa3a7665e
Add WithReadonlyTempMount to create readonly temporary mounts
This is necessary so we can mount snapshots more than once with overlayfs,
otherwise mounts enter an unknown state.

related: https://github.com/moby/buildkit/pull/1100

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Co-authored-by: Zou Nengren <zouyee1989@gmail.com>
2023-03-17 15:51:18 +00:00
Akihiro Suda
d8b68e3ccc
Stop using math/rand.Read and rand.Seed (deprecated in Go 1.20)
From golangci-lint:

> SA1019: rand.Read has been deprecated since Go 1.20 because it
>shouldn't be used: For almost all use cases, crypto/rand.Read is more
>appropriate. (staticcheck)

> SA1019: rand.Seed has been deprecated since Go 1.20 and an alternative
>has been available since Go 1.0: Programs that call Seed and then expect
>a specific sequence of results from the global random source (using
>functions such as Int) can be broken when a dependency changes how
>much it consumes from the global random source. To avoid such breakages,
>programs that need a specific result sequence should use
>NewRand(NewSource(seed)) to obtain a random generator that other
>packages cannot access. (staticcheck)

See also:

- https://pkg.go.dev/math/rand@go1.20#Read
- https://pkg.go.dev/math/rand@go1.20#Seed

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-02-16 03:50:23 +09:00
Kohei Tokunaga
eeab052425 Make mount.UnmountRecursive compatible to mount.UnmountAll
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2023-01-31 22:07:44 +09:00
Edgar Lee
34d5878185 Use mount.Target to specify subdirectory of rootfs mount
- Add Target to mount.Mount.
- Add UnmountMounts to unmount a list of mounts in reverse order.
- Add UnmountRecursive to unmount deepest mount first for a given target, using
moby/sys/mountinfo.

Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
2023-01-27 09:51:58 +08:00
Maksym Pavlenko
3bc8fc4d30 Cleanup build constraints
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-08 09:36:20 -08:00
Brian Goff
a24ef09937 Replace mount fork hack with CLONE_FS
This change spins up a new goroutine, locks it to a thread, then
unshares CLONE_FS which allows us to `Chdir` from inside the thread
without affecting the rest of the program.

The thread is no longer usable after unshare so it leaves the thread
locked to prevent go from returning the thread to the thread pool.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-11-03 22:30:35 +00:00
Phil Estes
455127859b
Merge pull request #7342 from tklauser/losetup-unix
Use ioctl helpers from x/sys/unix
2022-08-30 12:32:20 -04:00
Tobias Klauser
3cc3d8a560
mount: use ioctl helpers from x/sys/unix
Use the IoctlRetInt, IoctlSetInt and IoctlLoopSetStatus64 helper
functions defined in the golang.org/x/sys/unix package instead of
manually wrapping these using a locally defined ioctl function.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2022-08-30 10:38:29 +02:00
Sebastiaan van Stijn
9ae2cc3a8a
mount: remove unused ErrNotImplementOnWindows
This error was added in c5843b7615, but no longer
used since a5a9f91832, which implemented Windows
support.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-08-29 10:55:04 +02:00
Maksym Pavlenko
871b6b6a9f Use testify
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-01 18:17:58 -07:00
Eng Zer Jun
18ec2761c0
test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-15 14:03:50 +08:00
Shengjing Zhu
d28981d48e Fix build with gccgo
gccgo changes the mangling scheme
b483d0e0a2

The change is available in gcc-11, which is the least version that
implements go1.16.

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-22 02:31:58 +08:00
Derek McGowan
8816006d1e
Fix followup items from errors replacement
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-07 12:16:00 -08:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Michael Crosby
7b8a697f28
Merge pull request #6034 from claudiubelu/windows/fixes-image-volume
Fixes Windows containers with image volumes
2021-10-07 11:50:01 -04:00