document usage and design of blockfile snapshotter

Signed-off-by: Avi Deitcher <avi@deitcher.net>
2024-04-26 11:26:39 +03:00
parent 0426e3c2eb
commit e07b63d845
2 changed files with 188 additions and 1 deletions
--- a/docs/snapshotters/README.md
+++ b/docs/snapshotters/README.md
@@ -11,7 +11,7 @@ Generic:
 - `native`: Native file copying driver. Akin to Docker/Moby's "vfs" driver.
 Block-based:
- `blockfile`: A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts.
+- [`blockfile`](./blockfile.md): A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts.
 - `devmapper`: ext4/xfs device mapper. See [`devmapper.md`](./devmapper.md).
 Filesystem-specific:
--- a/docs/snapshotters/blockfile.md
+++ b/docs/snapshotters/blockfile.md
@@ -0,0 +1,187 @@
 # Blockfile Snapshotter
 The blockfile snapshotter uses raw block files for each snapshot. Block files are
 copied from a parent or base empty block file. Mounting requires a virtual machine
 or support for loopback mounts.
 ## Use Case
 Snapshotters serve the purpose of extracting an image from the OCI image store and
 creating a snapshot that is useful to containers. It handles setting up the
 underlying infrastructure, such as preparing a directory or other filesystem setup,
 applying the layers to create a single mountable directory to serve as the container
 base, and mounting into the container upon start.
 The most commonly used snapshotter is the overlayfs snapshotter, which is the default
 in containerd. The overlayfs snapshotter provides a directory on the host filesystem,
 which then is bind-mounted into the container.
 The blockfile snapshotter targets a use case where the container will run inside a
 VM. Specifically, the OCI image will be the filesystem for the container, like with
 a normal container, but the container itself will be run inside a VM.
 Since the VM cannot bind-mount directories from the host, the blockfile snapshotter
 creates a block device for the snapshot, which can be attached to the VM as a block
 device to facilitate getting the contents into the guest.
 ## Alternatives
 There are alternatives to the blockfile snapshotter for mounting directories into a
 VM. One alternative is a [virtiofs](https://virtio-fs.gitlab.io) driver,
 assuming your VMM supports it. Similarly, you can use
 [9p](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) to mount a local
 directory into the VM, assuming your VMM supports it.
 Additionally, the [devicemapper snapshotter](./devmapper.md) can be used to create
 snapshots on filesystem images in a devicemapper thin-pool.
 ## Usage
 ### Checking if the blockfile snapshotter is available
 To check if the blockfile snapshotter is available, run the following command:
 ```bash
 $ ctr plugins ls | grep blockfile
 ```
 ### Configuration
 To configure the snapshotter, you can use the following configuration options
 in your containerd `config.toml`. Don't forget to restart it after changing the
 configuration.
 ```toml
  [plugins.'io.containerd.snapshotter.v1.blockfile']
    scratch_file = "/opt/containerd/blockfile"
    root_path = "/somewhere/on/disk"
    fs_type = 'ext4'
    mount_options = []
    recreate_scratch = true
 ```
 - `root_path`: The directory where the block files are stored. This directory must be writable by the containerd process.
 - `scratch_file`: The path to the empty file that will be used as the base for the block files. This file should exist before first using the snapshotter.
 - `fs_type`: The filesystem type to use for the block files. Currently supported are `ext4` and `xfs`.
 - `mount_options`: Additional mount options to use when mounting the block files.
 - `recreate_scratch`: If set to `true`, the snapshotter will recreate the scratch file if it is missing. If set to `false`, the snapshotter will fail if the scratch file is missing.
 ### Creating the scratch file
 You can create a scratch file as follows. This example uses a 500MB scratch file.
 ```bash
 $ # make a 500M file
 $ dd if=/dev/zero of=/opt/containerd/blockfile bs=1M count=500
 500+0 records in
 500+0 records out
 524288000 bytes (524 MB, 500 MiB) copied, 1.76253 s, 297 MB/s
 $ # format the file with ext4
 $ sudo mkfs.ext4 /opt/containerd/blockfile
 mke2fs 1.47.0 (5-Feb-2023)
 Discarding device blocks: done
 Creating filesystem with 512000 1k blocks and 128016 inodes
 Filesystem UUID: d9947ecc-722d-4627-9cf9-fa2a3b622106
 Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
 Allocating group tables: done
 Writing inode tables: done
 Creating journal (8192 blocks): done
 Writing superblocks and filesystem accounting information: done
 ```
 ### Running a container
 To run a container using the blockfile snapshotter, you need to specify the
 snapshotter:
 ```bash
 $ # ensure that the image we are using exists; it is a regular OCI image
 $ ctr image pull docker.io/library/busybox:latest
 $ # run the container with the provides snapshotter
 $ ctr run -rm -t --snapshotter blockfile docker.io/library/busybox:latest hello sh
 ```
 To use it via the go client API, it is identical to using any other snapshotter:
 ```go
 import (
    "context"
    "github.com/containerd/containerd"
    "github.com/containerd/containerd/snapshots"
 )
 // create a new client
 client, err := containerd.New("/run/containerd/containerd.sock")
 snapshotter := "blockfile"
 cOpts := []containerd.NewContainerOpts{
 				containerd.WithImage(image),
 				containerd.WithImageConfigLabels(image),
 				containerd.WithAdditionalContainerLabels(labels),
 				containerd.WithSnapshotter(snapshotter)
 }
 container, err := client.NewContainer(ctx, containerID, cOpts...)
 ```
 ## How It Works
 The blockfile snapshotter functions similarly to other snapshotters.
 It unpacks each individual layer from a container image, with each layer unpack
 building on the content from its parent(s).
 The blockfile snapshotter is unique in two ways:
 1. It applies layers inside a disk image file, rather than on the host filesystem.
 1. It creates a block image file for each layer, applying the previous on top of it.
 Rather than a single directory with the contents, the end of the blockfile
 snapshotter's process is a single file, which has the contents of the full
 filesystem image. That image file can be loopback mounted, or attached to a virtual
 machine.
 For every layer the snapshotter creates a new blockfile, starting with a copy of the
 blockfile from the previous layer. If there is no previous layer, i.e. for the first
 layer, it copies the scratch file.
 For example, for an image with 3 layers - called A, B, C - the process is as follows:
 1. Layer A:
   1. Copy the scratch file to a new blockfile for layer A.
    1. Loopback-mount the blockfile for layer A.
    1. Apply layer A to the mount.
    1. Unmount the blockfile for layer A.
 1. Layer B:
    1. Copy the blockfile for layer A to a new blockfile for layer B.
    1. Loopback-mount the blockfile for layer B.
    1. Apply layer B to the mount.
    1. Unmount the blockfile for layer B.
 1. Layer C:
    1. Copy the blockfile for layer B to a new blockfile for layer C.
    1. Loopback-mount the blockfile for layer C.
    1. Apply layer C to the mount.
    1. Unmount the blockfile for layer C.
 Each unpack of a layer builds upon the contents of the previous layers into a new
 blockfile. This completes with the final blockfile containing the full filesystem
 image.
 As a result of the process, each layer leads to another blockfile in the system:
 1. Layer A blockfile: contents of layer A
 1. Layer B blockfile: contents of layer A + layer B
 1. Layer C blockfile: contents of layer A + layer B + layer C
 If available in the underlying filesystem and the host OS, the process uses
 sparse file support whenever available. This means that the blockfiles only take
 up the space required for the actual content.
 For example, if the scratch image is 500MB, and each layer adds 25MB, then the
 file sizes will be:
 1. Layer A blockfile: 25MB from layer A
 1. Layer B blockfile: 50MB from layer A and B
 1. Layer C blockfile: 75MB from layer A, B, and C
 Total space usage thus is 25+50+75=150MB. This is a fraction of the amount
 required if each layer's blockfile used the full 500MB, i.e. 1500MB in total.