Most snapshotters end up manually handling the rollback logic, either
by calling `t.Rollback()` in every failure path, setting up a custom
defer func to log on certain errors, or just deferring `t.Rollback()`
even for `snapshotter.Commit()` which *will* cause `t.Rollback()` to return
an error afaict, but it's just never checked and luckily bolt handles this
alright...
The devmapper snapshotter has a solution to this which is to have a
method that starts either a read-only or writable transaction inside
the method, and you pass in a callback to do your bidding and any
failures are rolled back, and if it's writable will handle the commit
for you. This seems like the right model to me, it removes the burden
from the snapshot author to remember to either defer/call rollback
in every method for every failure case.
This change exposes the convenience method from devmapper to the
snapshots/storage package as a method off of `storage.MetaStore` and moves
over the devmapper snapshotter to use this.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Two xfs file systems with same UUID can not be mounted on the same system.
However devmapper snapshots will have same UUID as original filesystem.
This patch fixes the bug by mounting a xfs file system with "nouuid" option.
Signed-off-by: Henry Wang <henwang@amazon.com>
Add file system options for config file, so that user can use
non-default file system parameters for the fs type of choice
Using file system options in config file overwrites the default
options already being used.
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
ext4 file system was supported before. This adds support for xfs as
well. Containerd config file can have fs_type as an additional option
with possible values as "xfs" and "ext4" for now. In future other
fstype support can be added. A snapshot created from a committed
snapshot inherits the file system type of the parent. Any new snapshots
that has no parent is created with the file system type indicated in
config. If there is no config for file system type is found, then
ext4 is assumed. This allows users to use xfs as an optional file system
type.
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.
This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.
From the related blogpost (ttps://blog.golang.org/path-security):
> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing
This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- View was somehow logging itself as "prepare"
- Cleanup should have its debug log as like other exported methods
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
If mkfs on device mapper thin pool fails, it will show pool status
as returned by dmsetup for enahnced error reporting.
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Before the change, the error on the caller-side (e.g. ctr) was
something like
> unpack: failed to prepare extraction snapshot "...": exit status 5:
> unknown
which was too cryptic.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
The issue beblow happens several times beforing the root
cause found:
1. A `fdisk -l` process has being hung up for a long time;
2. A image layer snapshot device is visiable to dmsetup, which
should *not* happen because it should be deactivated after
`Commit()`;
The backtrace of `fdisk` is always the same over time:
```bash
[<ffffffff810bbc6a>] io_schedule+0x2a/0x80
[<ffffffff81295a3f>] do_blockdev_direct_IO+0x1e9f/0x2f10
[<ffffffff81296aea>] __blockdev_direct_IO+0x3a/0x40
[<ffffffff81290e43>] blkdev_direct_IO+0x43/0x50
[<ffffffff811b8a14>] generic_file_read_iter+0x374/0x960
[<ffffffff81291ad5>] blkdev_read_iter+0x35/0x40
[<ffffffff8125229b>] new_sync_read+0xfb/0x240
[<ffffffff81252406>] __vfs_read+0x26/0x40
[<ffffffff81252b96>] vfs_read+0x96/0x130
[<ffffffff812540e5>] SyS_read+0x55/0xc0
[<ffffffff81003c04>] do_syscall_64+0x74/0x180
```
The root cause is, in Commit(), there's a race window between
`SuspendDevice()` and `DeactivateDevice()`, which may cause the
IOs of a process or command like `fdisk` on the "suspended" device
hang up forever. It has twofold:
1. The IOs suspends on the devices;
2. The device is in `Suspended` state, because it's deactivated with
`deferred` flag and without `force` flag;
So they cannot make progress.
One reproducer is:
1. enlarge the race window by putting sleep seconds there;
2. run `while true; do sudo fdisk -l; sleep 0.5; done` on one terminal;
3. and pull image on another terminal;
Fixes it by:
1. Resume the devices again after flushing IO by suspend;
2. Remove device without `deferred` flag;
Fix: #4234
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Snapshots GC takes use of pruneBranch() function to remove snapshots,
but GC will stop if snapshotter.Remove() returns error and the error
number is not ErrFailedPrecondition. This results in thousands of
dm snapshots not deleted if one snapshot is not deleted, due to
errors like "contains a filesystem in use".
So return ErrFailedPrecondition error number in Remove() function where
appropriate, and let GC process go on collecting other snapshots.
Fix: #3923
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen.rz@linux.alibaba.com>
1. reason to deactivate committed snapshot
The thin device will not be used for IO after committed,
and further thin snapshotting is OK using an inactive thin
device as origin. The benefits to deactivate are:
- device is not unneccesary visible avoiding any unexpected IO;
- save useless kernel data structs for maintaining active dm.
Quote from kernel doc (Documentation/device-mapper/provisioning.txt):
"
ii) Using an internal snapshot.
Once created, the user doesn't have to worry about any connection
between the origin and the snapshot. Indeed the snapshot is no
different from any other thinly-provisioned device and can be
snapshotted itself via the same method. It's perfectly legal to
have only one of them active, and there's no ordering requirement on
activating or removing them both. (This differs from conventional
device-mapper snapshots.)
"
2. an thinpool metadata bug is naturally removed
An problem happens when failed to suspend/resume origin thin device
when creating snapshot:
"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"
This issue occurs because when failed to create snapshot, the
snapshotter.store can be rollbacked, but the thin pool metadata
boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(),
therefore metadata becomes inconsistent: the snapshotID is not
taken in snapshotter.store, but saved in pool metadata boltdb.
The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls
are invoked on "first-in-last-out" order. When the error happens
on the "resume device" defer call, the metadata is saved and
snapshot is created, which has no chance to be rollbacked.
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>