Don't block snapshot garbage collection on Remove failures

If a snapshot removal fails (during garbage collection), the entire garbage collection operation is
cancelled. This is problematic because once cleanup of any snapshot fails no other snapshots will be cleaned
and the disk usage will just keep increasing.
Solution is to return snapshot removal errors wrapped as "ErrFailedPrecondition" errors. The garbage
collectors continues cleanup if the error is of this type.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
This commit is contained in:
Amit Barve 2023-12-04 09:57:50 -08:00
parent b8e32595ba
commit ad96fded4c

View File

@ -282,7 +282,9 @@ func (s *snapshotter) Remove(ctx context.Context, key string) error {
log.G(ctx).WithError(err1).WithField("path", renamed).Error("Failed to rename after failed commit") log.G(ctx).WithError(err1).WithField("path", renamed).Error("Failed to rename after failed commit")
} }
} }
return err // Return the error wrapped in ErrFailedPrecondition so that cleanup of other snapshots will
// still continue.
return errors.Join(errdefs.ErrFailedPrecondition, err)
} }
if err = hcsshim.DestroyLayer(s.info, renamedID); err != nil { if err = hcsshim.DestroyLayer(s.info, renamedID); err != nil {