Commit Graph

19 Commits

Author SHA1 Message Date
Alakesh Haloi
5ce35ac398 devmapper: log pool status when mkfs fails
If mkfs on device mapper thin pool fails, it will show pool status
as returned by dmsetup for enahnced error reporting.

Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
2021-04-12 19:24:04 +00:00
Kazuyoshi Kato
7704fe72d0 Specifically mention "mkfs.ext4" on the error from the command
Before the change, the error on the caller-side (e.g. ctr) was
something like

> unpack: failed to prepare extraction snapshot "...": exit status 5:
> unknown

which was too cryptic.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-03-19 10:38:47 -07:00
Derek McGowan
35eeb24a17
Fix exported comments enforcer in CI
Add comments where missing and fix incorrect comments

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-03-12 08:47:05 -08:00
Shengjing Zhu
5988bfc1ef docs: Various typo found by codespell
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2020-12-22 13:22:16 +08:00
Maksym Pavlenko
da68609866 Fix devmapper test
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-12-09 09:35:17 -08:00
Teemu Kallio
71fd68a920 devicemapper: seperate implementation pkg from plugin pkg
Signed-off-by: Teemu Kallio <teemu.kallio@pm.me>
2020-09-18 12:00:14 +02:00
Kazuyoshi Kato
74e9aa7abb snapshots/devmapper: don't hardcord the platform strings
The snapshotter doesn't have to exclude non-amd64 platforms.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2020-08-03 11:55:36 -07:00
Eric Ren
63b7587cd6 snapshots/devmapper: fix race windown causing IO hangup
The issue beblow happens several times beforing the root
cause found:

  1. A `fdisk -l` process has being hung up for a long time;
  2. A image layer snapshot device is visiable to dmsetup, which
       should *not* happen because it should be deactivated after
       `Commit()`;

The backtrace of `fdisk` is always the same over time:

```bash
[<ffffffff810bbc6a>] io_schedule+0x2a/0x80
[<ffffffff81295a3f>] do_blockdev_direct_IO+0x1e9f/0x2f10
[<ffffffff81296aea>] __blockdev_direct_IO+0x3a/0x40
[<ffffffff81290e43>] blkdev_direct_IO+0x43/0x50
[<ffffffff811b8a14>] generic_file_read_iter+0x374/0x960
[<ffffffff81291ad5>] blkdev_read_iter+0x35/0x40
[<ffffffff8125229b>] new_sync_read+0xfb/0x240
[<ffffffff81252406>] __vfs_read+0x26/0x40
[<ffffffff81252b96>] vfs_read+0x96/0x130
[<ffffffff812540e5>] SyS_read+0x55/0xc0
[<ffffffff81003c04>] do_syscall_64+0x74/0x180
```

The root cause is, in Commit(), there's a race window between
`SuspendDevice()` and `DeactivateDevice()`, which may cause the
IOs of a process or command like `fdisk` on the "suspended" device
hang up forever. It has twofold:

  1. The IOs suspends on the devices;
  2. The device is in `Suspended` state, because it's deactivated with
     `deferred` flag and without `force` flag;

So they cannot make progress.

One reproducer is:
 1. enlarge the race window by putting sleep seconds there;
 2. run `while true; do sudo fdisk -l; sleep 0.5; done` on one terminal;
 3. and pull image on another terminal;

Fixes it by:
 1. Resume the devices again after flushing IO by suspend;
 2. Remove device without `deferred` flag;

Fix: #4234
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2020-05-07 07:46:45 +08:00
Eric Ren
a3685262fe snapshots/devmapper: do not stop snapshot GC when one snapshot removing fails
Snapshots GC takes use of pruneBranch() function to remove snapshots,
but GC will stop if snapshotter.Remove() returns error and the error
number is not ErrFailedPrecondition. This results in thousands of
dm snapshots not deleted if one snapshot is not deleted, due to
errors like "contains a filesystem in use".

So return ErrFailedPrecondition error number in Remove() function where
appropriate, and let GC process go on collecting other snapshots.

Fix: #3923
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen.rz@linux.alibaba.com>
2020-02-29 13:32:48 +08:00
Eric Ren
b6bf7b97c2 devmapper: async remove device using Cleanup
Fix: #3923
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2020-02-29 13:32:48 +08:00
Derek McGowan
66aa1d3ef6
Add snapshot walk implementations
Temporarily remove zfs and aufs until interface update

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2019-10-24 11:11:22 -07:00
bpopovschi
e8c14c07c6
Added filters to snapshots API
Signed-off-by: bpopovschi <zyqsempai@mail.ru>
2019-10-24 11:11:22 -07:00
renzhen.rz
3887053177 snapshots/devmapper: deactivate thin device after committed
1. reason to deactivate committed snapshot

The thin device will not be used for IO after committed,
and further thin snapshotting is OK using an inactive thin
device as origin. The benefits to deactivate are:
 - device is not unneccesary visible avoiding any unexpected IO;
 - save useless kernel data structs for maintaining active dm.

 Quote from kernel doc (Documentation/device-mapper/provisioning.txt):

"
  ii) Using an internal snapshot.

  Once created, the user doesn't have to worry about any connection
  between the origin and the snapshot.  Indeed the snapshot is no
  different from any other thinly-provisioned device and can be
  snapshotted itself via the same method.  It's perfectly legal to
  have only one of them active, and there's no ordering requirement on
  activating or removing them both.  (This differs from conventional
  device-mapper snapshots.)
"

2. an thinpool metadata bug is naturally removed

An problem happens when failed to suspend/resume origin thin device
when creating snapshot:

"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"

This issue occurs because when failed to create snapshot, the
snapshotter.store can be rollbacked, but the thin pool metadata
boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(),
therefore metadata becomes inconsistent: the snapshotID is not
taken in snapshotter.store, but saved in pool metadata boltdb.

The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls
are invoked on "first-in-last-out" order. When the error happens
on the "resume device" defer call, the metadata is saved and
snapshot is created, which has no chance to be rollbacked.

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2019-05-09 10:58:21 +08:00
Maksym Pavlenko
87289a0c62 devmapper: implement Usage
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2019-03-27 14:50:12 -07:00
Maksym Pavlenko
95f0a4903c
devmapper: rollback thin devices on error
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 17:40:10 -08:00
Maksym Pavlenko
37cdedc61c
devmapper: add linux tags, fix build
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:26:46 -08:00
Maksym Pavlenko
0c6d194cce
devmapper: add README and minor fixes
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00
Maksym Pavlenko
2218275ec9
devmapper: register plugin
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00
Maksym Pavlenko
cec72efc2a
devmapper: add snapshotter
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
2019-02-21 16:25:55 -08:00