snapshots/devmapper: fix race windown causing IO hangup

The issue beblow happens several times beforing the root
cause found:

  1. A `fdisk -l` process has being hung up for a long time;
  2. A image layer snapshot device is visiable to dmsetup, which
       should *not* happen because it should be deactivated after
       `Commit()`;

The backtrace of `fdisk` is always the same over time:

```bash
[<ffffffff810bbc6a>] io_schedule+0x2a/0x80
[<ffffffff81295a3f>] do_blockdev_direct_IO+0x1e9f/0x2f10
[<ffffffff81296aea>] __blockdev_direct_IO+0x3a/0x40
[<ffffffff81290e43>] blkdev_direct_IO+0x43/0x50
[<ffffffff811b8a14>] generic_file_read_iter+0x374/0x960
[<ffffffff81291ad5>] blkdev_read_iter+0x35/0x40
[<ffffffff8125229b>] new_sync_read+0xfb/0x240
[<ffffffff81252406>] __vfs_read+0x26/0x40
[<ffffffff81252b96>] vfs_read+0x96/0x130
[<ffffffff812540e5>] SyS_read+0x55/0xc0
[<ffffffff81003c04>] do_syscall_64+0x74/0x180
```

The root cause is, in Commit(), there's a race window between
`SuspendDevice()` and `DeactivateDevice()`, which may cause the
IOs of a process or command like `fdisk` on the "suspended" device
hang up forever. It has twofold:

  1. The IOs suspends on the devices;
  2. The device is in `Suspended` state, because it's deactivated with
     `deferred` flag and without `force` flag;

So they cannot make progress.

One reproducer is:
 1. enlarge the race window by putting sleep seconds there;
 2. run `while true; do sudo fdisk -l; sleep 0.5; done` on one terminal;
 3. and pull image on another terminal;

Fixes it by:
 1. Resume the devices again after flushing IO by suspend;
 2. Remove device without `deferred` flag;

Fix: #4234
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
This commit is contained in:
Eric Ren 2020-05-06 23:05:31 +08:00
parent b1f514641f
commit 63b7587cd6
2 changed files with 25 additions and 3 deletions

View File

@ -347,6 +347,16 @@ func (p *PoolDevice) SuspendDevice(ctx context.Context, deviceName string) error
return nil return nil
} }
func (p *PoolDevice) ResumeDevice(ctx context.Context, deviceName string) error {
if err := p.transition(ctx, deviceName, Resuming, Resumed, func() error {
return dmsetup.ResumeDevice(deviceName)
}); err != nil {
return errors.Wrapf(err, "failed to resume device %q", deviceName)
}
return nil
}
// DeactivateDevice deactivates thin device // DeactivateDevice deactivates thin device
func (p *PoolDevice) DeactivateDevice(ctx context.Context, deviceName string, deferred, withForce bool) error { func (p *PoolDevice) DeactivateDevice(ctx context.Context, deviceName string, deferred, withForce bool) error {
if !p.IsLoaded(deviceName) { if !p.IsLoaded(deviceName) {

View File

@ -277,14 +277,26 @@ func (s *Snapshotter) Commit(ctx context.Context, name, key string, opts ...snap
return err return err
} }
// The thin snapshot is not used for IO after committed, so // After committed, the snapshot device will not be directly
// suspend to flush the IO and deactivate the device. // used anymore. We'd better deativate it to make it *invisible*
// in userspace, so that tools like LVM2 and fdisk cannot touch it,
// and avoid useless IOs on it.
//
// Before deactivation, we need to flush the outstanding IO by suspend.
// Afterward, we resume it again to prevent a race window which may cause
// a process IO hang. See the issue below for details:
// (https://github.com/containerd/containerd/issues/4234)
err = s.pool.SuspendDevice(ctx, deviceName) err = s.pool.SuspendDevice(ctx, deviceName)
if err != nil { if err != nil {
return err return err
} }
return s.pool.DeactivateDevice(ctx, deviceName, true, false) err = s.pool.ResumeDevice(ctx, deviceName)
if err != nil {
return err
}
return s.pool.DeactivateDevice(ctx, deviceName, false, false)
}) })
} }