For LCOW currently we copy (or create) the scratch.vhdx for every single snapshot
so there ends up being a sandbox.vhdx in every directory seemingly unnecessarily. With the default scratch
size of 20GB the size on disk is about 17MB so there's a 17MB overhead per layer plus the time to copy the
file with every snapshot. Only the final sandbox.vhdx is actually used so this would be a nice little
optimization.
For WCOW we essentially do the exact same except copy the blank vhdx from the base layer.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
When running tests on any modern distro, this assumption will work. If
we need to make it work with kernels where we don't append this option
it will require some more involved changes.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Previously there wwasn't a way to pass any labels to snapshotters as the wrapper
around WithNewSnapshot didn't have a parm to pass them in.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Put the overlay plugin in a separate package to allow the overlay package to be
used without needing to import and initialize the plugin.
Signed-off-by: Derek McGowan <derek@mcg.dev>
This allows configuring the location of the overlayfs snapshotter by
adding the following in config.toml
```
[plugins]
[plugins.overlayfs]
root_path = "/custom_location"
```
This is useful to isolate disk i/o for overlayfs from the rest of
containerd and prevent containers saturating disk i/o from negatively
affecting containerd operations and cause timeouts.
Signed-off-by: Ashray Jain <ashrayj@palantir.com>
The rollback mechanism is implemented by calling deleteDevice() and
RemoveDevice(). But RemoveDevice() is internally calling
deleteDevice() as well.
Since a device will be deleted by first deleteDevice(),
RemoveDevice() always will see ENODATA. The specific error must be
ignored to remove the device's metadata correctly.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
kernel version > 4.13rc1 support index=on feature, it will be failed
with EBUSY when trying to mount.
Related: https://github.com/moby/moby/pull/37993
Signed-off-by: Rudy Zhang <rudyflyzhang@gmail.com>
Dependencies may be switching to use the new `%w` formatting
option to wrap errors; switching to use `errors.Is()` makes
sure that we are still able to unwrap the error and detect the
underlying cause.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The issue beblow happens several times beforing the root
cause found:
1. A `fdisk -l` process has being hung up for a long time;
2. A image layer snapshot device is visiable to dmsetup, which
should *not* happen because it should be deactivated after
`Commit()`;
The backtrace of `fdisk` is always the same over time:
```bash
[<ffffffff810bbc6a>] io_schedule+0x2a/0x80
[<ffffffff81295a3f>] do_blockdev_direct_IO+0x1e9f/0x2f10
[<ffffffff81296aea>] __blockdev_direct_IO+0x3a/0x40
[<ffffffff81290e43>] blkdev_direct_IO+0x43/0x50
[<ffffffff811b8a14>] generic_file_read_iter+0x374/0x960
[<ffffffff81291ad5>] blkdev_read_iter+0x35/0x40
[<ffffffff8125229b>] new_sync_read+0xfb/0x240
[<ffffffff81252406>] __vfs_read+0x26/0x40
[<ffffffff81252b96>] vfs_read+0x96/0x130
[<ffffffff812540e5>] SyS_read+0x55/0xc0
[<ffffffff81003c04>] do_syscall_64+0x74/0x180
```
The root cause is, in Commit(), there's a race window between
`SuspendDevice()` and `DeactivateDevice()`, which may cause the
IOs of a process or command like `fdisk` on the "suspended" device
hang up forever. It has twofold:
1. The IOs suspends on the devices;
2. The device is in `Suspended` state, because it's deactivated with
`deferred` flag and without `force` flag;
So they cannot make progress.
One reproducer is:
1. enlarge the race window by putting sleep seconds there;
2. run `while true; do sudo fdisk -l; sleep 0.5; done` on one terminal;
3. and pull image on another terminal;
Fixes it by:
1. Resume the devices again after flushing IO by suspend;
2. Remove device without `deferred` flag;
Fix: #4234
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
For remote snapshotter cases it's quite often there is need to pass extra info
from client (for instance - registry URL to query remote layer from, credentials, etc).
This commit slightly extends WithPullSnapshotter to pass extra labels to a snapshotter.
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Snapshots GC takes use of pruneBranch() function to remove snapshots,
but GC will stop if snapshotter.Remove() returns error and the error
number is not ErrFailedPrecondition. This results in thousands of
dm snapshots not deleted if one snapshot is not deleted, due to
errors like "contains a filesystem in use".
So return ErrFailedPrecondition error number in Remove() function where
appropriate, and let GC process go on collecting other snapshots.
Fix: #3923
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen.rz@linux.alibaba.com>
Cleanup is an optional method a snapshotter may implement.
Cleanup can be used to cleanup resources after a snapshot
has been removed. This function allows a snapshotter to defer
longer resource cleanup until after snapshot removals are
completed. Adding this to the API allows proxy snapshotters
to leverage this enhancement.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
base_image_size effectively is the limit of a layer size that can be
created using the devmapper snapshotter. While this will also depend on
the thinpool size itself, something closer to the total image size
(80%?) is more appropriate.
As is, if you try to run an image like elastic, you'll need a much
larger base_image_size than 128MB.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Add logging and move the creation of the snapshotter inside
the attempt loop to catch cases where the mountinfo may
not be updated yet. When all attempts are reached there
is no reason to create the snapshotter as the unmount has
already occurred.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
- reproducer
1. stop a container;
2. reboot, or dmsetup remove its corresponding dm device;
3. start the container, it will fail like:
"""
Error: failed to start containers: {"message":"failed to create container(4f33d2760760c41518a84821153ccdf7f80980b797b783cdd75178fc6ca0bf4b) on containerd: failed to create task for container(4f33d2760760c41518a84821153ccdf7f80980b797b783cdd75178fc6ca0bf4b): failed to mount rootfs component &{ext4 /dev/mapper/vg0-mythinpool-snap-2 []}: no such file or directory: unknown"}
"""
- how the fix works
activate the dm device if necessary, and give a warn msg:
"""
time="2019-08-21T22:44:08.422695797+08:00" level=warning msg="devmapper device \"vg0-mythinpool-snap-2\" marked as \"Activated\" but not active, activating it"
"""
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Enables showing debug logs in testing output.
For integration tests the client log output will show
in addition to daemon output, with timestamps for better
correlation.
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
This also refactors the lcow and windows
snapshotters to use go-winio's utility functions for checking the
filesystem type.
Signed-off-by: Eric Hotinger <ehotinger@gmail.com>
For block device like devicemapper, umount before committing
active snapshot can sync data onto disk. Otherewise:
1. deactivating a block device in use will fail or IO error
when forced;
2. new snapshot on it will not catch the data not laid on disk yet.
Signed-off-by: Eric Ren <renzhen.rz@alibaba-linux.com>
1. reason to deactivate committed snapshot
The thin device will not be used for IO after committed,
and further thin snapshotting is OK using an inactive thin
device as origin. The benefits to deactivate are:
- device is not unneccesary visible avoiding any unexpected IO;
- save useless kernel data structs for maintaining active dm.
Quote from kernel doc (Documentation/device-mapper/provisioning.txt):
"
ii) Using an internal snapshot.
Once created, the user doesn't have to worry about any connection
between the origin and the snapshot. Indeed the snapshot is no
different from any other thinly-provisioned device and can be
snapshotted itself via the same method. It's perfectly legal to
have only one of them active, and there's no ordering requirement on
activating or removing them both. (This differs from conventional
device-mapper snapshots.)
"
2. an thinpool metadata bug is naturally removed
An problem happens when failed to suspend/resume origin thin device
when creating snapshot:
"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"
This issue occurs because when failed to create snapshot, the
snapshotter.store can be rollbacked, but the thin pool metadata
boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(),
therefore metadata becomes inconsistent: the snapshotID is not
taken in snapshotter.store, but saved in pool metadata boltdb.
The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls
are invoked on "first-in-last-out" order. When the error happens
on the "resume device" defer call, the metadata is saved and
snapshot is created, which has no chance to be rollbacked.
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
```
func foo() error {
defer func() {
if err != nil {
...
}
}()
...
}
```
use defer func to do something when err not nil, if foo() not use
named error, `err != nil` can not catch all errors, since when err
re-defined in if condition, it is a new variable.
Signed-off-by: Ace-Tang <aceapril@126.com>
Since there is no real action the user can do, these can safely be
informative that the underlying filesystem does not support a snapshot
plugin at boot.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Auto-detect longest common dir in lowerdir option and compact it if the
option size is hitting one page size. If does, Use chdir + CLONE to do
mount thing to avoid hitting one page argument buffer in linux kernel
mount.
Signed-off-by: Wei Fu <fhfuwei@163.com>