snapshots/devmapper: do not stop snapshot GC when one snapshot removing fails

Snapshots GC takes use of pruneBranch() function to remove snapshots,
but GC will stop if snapshotter.Remove() returns error and the error
number is not ErrFailedPrecondition. This results in thousands of
dm snapshots not deleted if one snapshot is not deleted, due to
errors like "contains a filesystem in use".

So return ErrFailedPrecondition error number in Remove() function where
appropriate, and let GC process go on collecting other snapshots.

Fix: #3923
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen.rz@linux.alibaba.com>
This commit is contained in:
Eric Ren 2019-04-25 14:30:15 +08:00 committed by Eric Ren
parent b6bf7b97c2
commit a3685262fe

View File

@ -27,6 +27,7 @@ import (
"strings" "strings"
"sync" "sync"
"github.com/containerd/containerd/errdefs"
"github.com/containerd/containerd/log" "github.com/containerd/containerd/log"
"github.com/containerd/containerd/mount" "github.com/containerd/containerd/mount"
"github.com/containerd/containerd/plugin" "github.com/containerd/containerd/plugin"
@ -306,7 +307,11 @@ func (s *Snapshotter) removeDevice(ctx context.Context, key string) error {
if !s.config.AsyncRemove { if !s.config.AsyncRemove {
if err := s.pool.RemoveDevice(ctx, deviceName); err != nil { if err := s.pool.RemoveDevice(ctx, deviceName); err != nil {
log.G(ctx).WithError(err).Errorf("failed to remove device") log.G(ctx).WithError(err).Errorf("failed to remove device")
return err // Tell snapshot GC continue to collect other snapshots.
// Otherwise, one snapshot collection failure will stop
// the GC, and all snapshots won't be collected even though
// having no relationship with the failed one.
return errdefs.ErrFailedPrecondition
} }
} else { } else {
// The asynchronous cleanup will do the real device remove work. // The asynchronous cleanup will do the real device remove work.