Commit Graph

54 Commits

Author SHA1 Message Date
Alakesh Haloi
5ce35ac398 devmapper: log pool status when mkfs fails
If mkfs on device mapper thin pool fails, it will show pool status
as returned by dmsetup for enahnced error reporting.

Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
2021-04-12 19:24:04 +00:00
Derek McGowan
261c107ffc
Merge pull request #5278 from mxpv/toml
Migrate TOML to github.com/pelletier/go-toml
2021-04-01 21:24:52 -07:00
Kazuyoshi Kato
e1f51ba73d Use os.File#Seek() to get the size of a block device
Instead of calling blockdev(1), this change uses os.File#Seek which
would be more effecient.

https://github.com/firecracker-microvm/firecracker/pull/1371

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-03-26 10:14:38 -07:00
Maksym Pavlenko
ddd4298a10 Migrate current TOML code to github.com/pelletier/go-toml
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-03-25 13:13:33 -07:00
Kazuyoshi Kato
7704fe72d0 Specifically mention "mkfs.ext4" on the error from the command
Before the change, the error on the caller-side (e.g. ctr) was
something like

> unpack: failed to prepare extraction snapshot "...": exit status 5:
> unknown

which was too cryptic.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-03-19 10:38:47 -07:00
Derek McGowan
35eeb24a17
Fix exported comments enforcer in CI
Add comments where missing and fix incorrect comments

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-03-12 08:47:05 -08:00
Michael Crosby
7738246cd9
Merge pull request #5111 from ctrlaltdel121/master
mark device faulty after parent fails to suspend
2021-03-08 14:13:25 -05:00
Maksym Pavlenko
e1b4c0ad43 Remove flaky devmapper check
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-03-03 14:51:11 -08:00
Jeremy Williams
51a72f0492 mark device faulty after parent fails to suspend
When an error is returned here, unlike the other error returns in the function, nothing is done to mark the added device as faulty or remove it.
I have observed this causing future snapshot creations to continue to attempt to use the same ID (from the sequence) to create new devices
and get blocked because the device already exists because it was not rolled back here.

Hopefully fixes #5110

Signed-off-by: Jeremy Williams <ctrlaltdel121@gmail.com>
2021-03-03 17:02:07 -05:00
Kazuyoshi Kato
2ac33d79fe test: fix assert.Check's argumets to show its parameters correctly
The change I made at db6075fc2 didn't show its parameters correctly.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-02-04 10:56:58 -08:00
Kazuyoshi Kato
db6075fc24 snapshot/devmapper: log actual values to investigate #4965
This test has been flaky in GitHub Actions. This change logs the
values from devmapper to further investigate the issue.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-02-01 16:27:59 -08:00
Peng Tao
b7026236f4 snapshot/devmapper: use losetup in mount package
No need to use the private losetup command line wrapper package.
The generic package provides the same functionality.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-04 10:15:04 -08:00
Shengjing Zhu
5988bfc1ef docs: Various typo found by codespell
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2020-12-22 13:22:16 +08:00
Maksym Pavlenko
da68609866 Fix devmapper test
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-12-09 09:35:17 -08:00
Maksym Pavlenko
2b87d4554f Add retries when deleting a devmapper device
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-12-09 09:13:34 -08:00
Teemu Kallio
71fd68a920 devicemapper: seperate implementation pkg from plugin pkg
Signed-off-by: Teemu Kallio <teemu.kallio@pm.me>
2020-09-18 12:00:14 +02:00
Kazuyoshi Kato
a1f6c9dd88 snapshots/devmapper: fix rollback
The rollback mechanism is implemented by calling deleteDevice() and
RemoveDevice(). But RemoveDevice() is internally calling
deleteDevice() as well.

Since a device will be deleted by first deleteDevice(),
RemoveDevice() always will see ENODATA. The specific error must be
ignored to remove the device's metadata correctly.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2020-08-17 15:41:03 -07:00
Kazuyoshi Kato
74e9aa7abb snapshots/devmapper: don't hardcord the platform strings
The snapshotter doesn't have to exclude non-amd64 platforms.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2020-08-03 11:55:36 -07:00
Kazuyoshi Kato
c383436af7 snapshots/devmapper: suspend a device to avoid data corruption
According to https://github.com/torvalds/linux/blob/v5.7/Documentation/admin-guide/device-mapper/thin-provisioning.rst#internal-snapshots;

> If the origin device that you wish to snapshot is active, you
> must suspend it before creating the snapshot to avoid corruption.

However the devmapper snapshotter was not doing that.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2020-07-16 15:08:07 -07:00
Sebastiaan van Stijn
dc92ad6520
Replace errors.Cause() with errors.Is()
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-08 14:36:45 +02:00
Eric Ren
63b7587cd6 snapshots/devmapper: fix race windown causing IO hangup
The issue beblow happens several times beforing the root
cause found:

  1. A `fdisk -l` process has being hung up for a long time;
  2. A image layer snapshot device is visiable to dmsetup, which
       should *not* happen because it should be deactivated after
       `Commit()`;

The backtrace of `fdisk` is always the same over time:

```bash
[<ffffffff810bbc6a>] io_schedule+0x2a/0x80
[<ffffffff81295a3f>] do_blockdev_direct_IO+0x1e9f/0x2f10
[<ffffffff81296aea>] __blockdev_direct_IO+0x3a/0x40
[<ffffffff81290e43>] blkdev_direct_IO+0x43/0x50
[<ffffffff811b8a14>] generic_file_read_iter+0x374/0x960
[<ffffffff81291ad5>] blkdev_read_iter+0x35/0x40
[<ffffffff8125229b>] new_sync_read+0xfb/0x240
[<ffffffff81252406>] __vfs_read+0x26/0x40
[<ffffffff81252b96>] vfs_read+0x96/0x130
[<ffffffff812540e5>] SyS_read+0x55/0xc0
[<ffffffff81003c04>] do_syscall_64+0x74/0x180
```

The root cause is, in Commit(), there's a race window between
`SuspendDevice()` and `DeactivateDevice()`, which may cause the
IOs of a process or command like `fdisk` on the "suspended" device
hang up forever. It has twofold:

  1. The IOs suspends on the devices;
  2. The device is in `Suspended` state, because it's deactivated with
     `deferred` flag and without `force` flag;

So they cannot make progress.

One reproducer is:
 1. enlarge the race window by putting sleep seconds there;
 2. run `while true; do sudo fdisk -l; sleep 0.5; done` on one terminal;
 3. and pull image on another terminal;

Fixes it by:
 1. Resume the devices again after flushing IO by suspend;
 2. Remove device without `deferred` flag;

Fix: #4234
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2020-05-07 07:46:45 +08:00
Maksym Pavlenko
bd22653003 Add devmapper configuration examples
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2020-03-17 18:16:58 -07:00
Maksym Pavlenko
e2e40e19d7
Merge pull request #3924 from renzhengeek/renzhen/snapshot-gc
snapshots/devmapper: do not stop snapshot GC when one snapshot removing fails
2020-03-12 19:28:55 -07:00
Eric Ren
a3685262fe snapshots/devmapper: do not stop snapshot GC when one snapshot removing fails
Snapshots GC takes use of pruneBranch() function to remove snapshots,
but GC will stop if snapshotter.Remove() returns error and the error
number is not ErrFailedPrecondition. This results in thousands of
dm snapshots not deleted if one snapshot is not deleted, due to
errors like "contains a filesystem in use".

So return ErrFailedPrecondition error number in Remove() function where
appropriate, and let GC process go on collecting other snapshots.

Fix: #3923
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen.rz@linux.alibaba.com>
2020-02-29 13:32:48 +08:00
Eric Ren
b6bf7b97c2 devmapper: async remove device using Cleanup
Fix: #3923
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2020-02-29 13:32:48 +08:00
Sebastiaan van Stijn
f2edc6f164
vendor: update gotest.tools v3.0.2
full diff: https://github.com/gotestyourself/gotest.tools/compare/v2.3.0...v3.0.2

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-02-28 17:47:20 +01:00
Maksym Pavlenko
f0652e1434 Make tests less flaky
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2020-01-30 09:57:34 -08:00
Maksym Pavlenko
75efbaf678 Attempt to make device mapper snapshotter tests less flaky
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-12-13 13:20:23 -08:00
Eric Ernst
731e144a48 devmapper: update example base image size in README
base_image_size effectively is the limit of a layer size that can be
created using the devmapper snapshotter. While this will also depend on
the thinpool size itself, something closer to the total image size
(80%?) is more appropriate.

As is, if you try to run an image like elastic, you'll need a much
larger base_image_size than 128MB.

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-11-20 12:26:16 -08:00
Derek McGowan
66aa1d3ef6
Add snapshot walk implementations
Temporarily remove zfs and aufs until interface update

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-10-24 11:11:22 -07:00
bpopovschi
e8c14c07c6
Added filters to snapshots API
Signed-off-by: bpopovschi <zyqsempai@mail.ru>
2019-10-24 11:11:22 -07:00
renzhen.rz
4d11bb36ad devmapper: activate dm device if snap device marked as activated
- reproducer
 1. stop a container;
 2. reboot, or dmsetup remove its corresponding dm device;
 3. start the container, it will fail like:

 """
 Error: failed to start containers: {"message":"failed to create container(4f33d2760760c41518a84821153ccdf7f80980b797b783cdd75178fc6ca0bf4b) on containerd: failed to create task for container(4f33d2760760c41518a84821153ccdf7f80980b797b783cdd75178fc6ca0bf4b): failed to mount rootfs component &{ext4 /dev/mapper/vg0-mythinpool-snap-2 []}: no such file or directory: unknown"}
 """
- how the fix works
 activate the dm device if necessary, and give a warn msg:

 """
 time="2019-08-21T22:44:08.422695797+08:00" level=warning msg="devmapper device \"vg0-mythinpool-snap-2\" marked as \"Activated\" but not active, activating it"
 """

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2019-08-23 10:19:28 +08:00
Maksym Pavlenko
0a4bf1bd1e Mark faulty devices
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-08-05 12:05:36 -07:00
Maksym Pavlenko
3741fd8591 Remove deferred flag when removing devmapper device
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-07-31 11:28:33 -07:00
Maksym Pavlenko
4d5a0e19eb Mark faulty device in one transaction
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-07-30 16:26:55 -07:00
Maksym Pavlenko
878a3205cd Better error recovery in devmapper
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-07-30 15:17:17 -07:00
renzhen.rz
3887053177 snapshots/devmapper: deactivate thin device after committed
1. reason to deactivate committed snapshot

The thin device will not be used for IO after committed,
and further thin snapshotting is OK using an inactive thin
device as origin. The benefits to deactivate are:
 - device is not unneccesary visible avoiding any unexpected IO;
 - save useless kernel data structs for maintaining active dm.

 Quote from kernel doc (Documentation/device-mapper/provisioning.txt):

"
  ii) Using an internal snapshot.

  Once created, the user doesn't have to worry about any connection
  between the origin and the snapshot.  Indeed the snapshot is no
  different from any other thinly-provisioned device and can be
  snapshotted itself via the same method.  It's perfectly legal to
  have only one of them active, and there's no ordering requirement on
  activating or removing them both.  (This differs from conventional
  device-mapper snapshots.)
"

2. an thinpool metadata bug is naturally removed

An problem happens when failed to suspend/resume origin thin device
when creating snapshot:

"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"

This issue occurs because when failed to create snapshot, the
snapshotter.store can be rollbacked, but the thin pool metadata
boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(),
therefore metadata becomes inconsistent: the snapshotID is not
taken in snapshotter.store, but saved in pool metadata boltdb.

The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls
are invoked on "first-in-last-out" order. When the error happens
on the "resume device" defer call, the metadata is saved and
snapshot is created, which has no chance to be rollbacked.

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2019-05-09 10:58:21 +08:00
Davor Kapsa
cfc36388b3 Remove redundant error checks
Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>
2019-04-30 21:28:51 +02:00
Maksym Pavlenko
87289a0c62 devmapper: implement Usage
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2019-03-27 14:50:12 -07:00
Maksym Pavlenko
010b4da36f devmapper: implement dmsetup status
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2019-03-27 14:26:07 -07:00
Maksym Pavlenko
208957ba3c
devmapper: proper cleanup in pool device test
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-22 12:51:27 -08:00
Maksym Pavlenko
734989c2a0
Update README
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-22 11:10:51 -08:00
Maksym Pavlenko
95f0a4903c
devmapper: rollback thin devices on error
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 17:40:10 -08:00
Maksym Pavlenko
adf5c640f4
devmapper: don't create or reload thin-pool from snapshotter
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:26:46 -08:00
Maksym Pavlenko
7efda48c53
devmapper: more precise way of checking if device is activated
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:26:46 -08:00
Maksym Pavlenko
37cdedc61c
devmapper: add linux tags, fix build
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:26:46 -08:00
Maksym Pavlenko
0c6d194cce
devmapper: add README and minor fixes
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00
Maksym Pavlenko
2218275ec9
devmapper: register plugin
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00
Maksym Pavlenko
cec72efc2a
devmapper: add snapshotter
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00
Maksym Pavlenko
3a75882520
devmapper: add pool device manager
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00