document usage and design of blockfile snapshotter
Signed-off-by: Avi Deitcher <avi@deitcher.net>
This commit is contained in:
		| @@ -11,7 +11,7 @@ Generic: | |||||||
| - `native`: Native file copying driver. Akin to Docker/Moby's "vfs" driver. | - `native`: Native file copying driver. Akin to Docker/Moby's "vfs" driver. | ||||||
|  |  | ||||||
| Block-based: | Block-based: | ||||||
| - `blockfile`: A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts. | - [`blockfile`](./blockfile.md): A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts. | ||||||
| - `devmapper`: ext4/xfs device mapper. See [`devmapper.md`](./devmapper.md). | - `devmapper`: ext4/xfs device mapper. See [`devmapper.md`](./devmapper.md). | ||||||
|  |  | ||||||
| Filesystem-specific: | Filesystem-specific: | ||||||
|   | |||||||
							
								
								
									
										187
									
								
								docs/snapshotters/blockfile.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										187
									
								
								docs/snapshotters/blockfile.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,187 @@ | |||||||
|  | # Blockfile Snapshotter | ||||||
|  |  | ||||||
|  | The blockfile snapshotter uses raw block files for each snapshot. Block files are | ||||||
|  | copied from a parent or base empty block file. Mounting requires a virtual machine | ||||||
|  | or support for loopback mounts. | ||||||
|  |  | ||||||
|  | ## Use Case | ||||||
|  |  | ||||||
|  | Snapshotters serve the purpose of extracting an image from the OCI image store and | ||||||
|  | creating a snapshot that is useful to containers. It handles setting up the | ||||||
|  | underlying infrastructure, such as preparing a directory or other filesystem setup, | ||||||
|  | applying the layers to create a single mountable directory to serve as the container | ||||||
|  | base, and mounting into the container upon start. | ||||||
|  |  | ||||||
|  | The most commonly used snapshotter is the overlayfs snapshotter, which is the default | ||||||
|  | in containerd. The overlayfs snapshotter provides a directory on the host filesystem, | ||||||
|  | which then is bind-mounted into the container. | ||||||
|  |  | ||||||
|  | The blockfile snapshotter targets a use case where the container will run inside a | ||||||
|  | VM. Specifically, the OCI image will be the filesystem for the container, like with | ||||||
|  | a normal container, but the container itself will be run inside a VM. | ||||||
|  | Since the VM cannot bind-mount directories from the host, the blockfile snapshotter | ||||||
|  | creates a block device for the snapshot, which can be attached to the VM as a block | ||||||
|  | device to facilitate getting the contents into the guest. | ||||||
|  |  | ||||||
|  | ## Alternatives | ||||||
|  |  | ||||||
|  | There are alternatives to the blockfile snapshotter for mounting directories into a | ||||||
|  | VM. One alternative is a [virtiofs](https://virtio-fs.gitlab.io) driver, | ||||||
|  | assuming your VMM supports it. Similarly, you can use | ||||||
|  | [9p](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) to mount a local | ||||||
|  | directory into the VM, assuming your VMM supports it. | ||||||
|  |  | ||||||
|  | Additionally, the [devicemapper snapshotter](./devmapper.md) can be used to create | ||||||
|  | snapshots on filesystem images in a devicemapper thin-pool. | ||||||
|  |  | ||||||
|  | ## Usage | ||||||
|  |  | ||||||
|  | ### Checking if the blockfile snapshotter is available | ||||||
|  |  | ||||||
|  | To check if the blockfile snapshotter is available, run the following command: | ||||||
|  |  | ||||||
|  | ```bash | ||||||
|  | $ ctr plugins ls | grep blockfile | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | ### Configuration | ||||||
|  |  | ||||||
|  | To configure the snapshotter, you can use the following configuration options | ||||||
|  | in your containerd `config.toml`. Don't forget to restart it after changing the | ||||||
|  | configuration. | ||||||
|  |  | ||||||
|  | ```toml | ||||||
|  |   [plugins.'io.containerd.snapshotter.v1.blockfile'] | ||||||
|  |     scratch_file = "/opt/containerd/blockfile" | ||||||
|  |     root_path = "/somewhere/on/disk" | ||||||
|  |     fs_type = 'ext4' | ||||||
|  |     mount_options = [] | ||||||
|  |     recreate_scratch = true | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | - `root_path`: The directory where the block files are stored. This directory must be writable by the containerd process. | ||||||
|  | - `scratch_file`: The path to the empty file that will be used as the base for the block files. This file should exist before first using the snapshotter. | ||||||
|  | - `fs_type`: The filesystem type to use for the block files. Currently supported are `ext4` and `xfs`. | ||||||
|  | - `mount_options`: Additional mount options to use when mounting the block files. | ||||||
|  | - `recreate_scratch`: If set to `true`, the snapshotter will recreate the scratch file if it is missing. If set to `false`, the snapshotter will fail if the scratch file is missing. | ||||||
|  |  | ||||||
|  | ### Creating the scratch file | ||||||
|  |  | ||||||
|  | You can create a scratch file as follows. This example uses a 500MB scratch file. | ||||||
|  |  | ||||||
|  | ```bash | ||||||
|  | $ # make a 500M file | ||||||
|  | $ dd if=/dev/zero of=/opt/containerd/blockfile bs=1M count=500 | ||||||
|  | 500+0 records in | ||||||
|  | 500+0 records out | ||||||
|  | 524288000 bytes (524 MB, 500 MiB) copied, 1.76253 s, 297 MB/s | ||||||
|  |  | ||||||
|  | $ # format the file with ext4 | ||||||
|  | $ sudo mkfs.ext4 /opt/containerd/blockfile | ||||||
|  | mke2fs 1.47.0 (5-Feb-2023) | ||||||
|  | Discarding device blocks: done | ||||||
|  | Creating filesystem with 512000 1k blocks and 128016 inodes | ||||||
|  | Filesystem UUID: d9947ecc-722d-4627-9cf9-fa2a3b622106 | ||||||
|  | Superblock backups stored on blocks: | ||||||
|  |         8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 | ||||||
|  |  | ||||||
|  | Allocating group tables: done | ||||||
|  | Writing inode tables: done | ||||||
|  | Creating journal (8192 blocks): done | ||||||
|  | Writing superblocks and filesystem accounting information: done | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | ### Running a container | ||||||
|  |  | ||||||
|  | To run a container using the blockfile snapshotter, you need to specify the | ||||||
|  | snapshotter: | ||||||
|  |  | ||||||
|  | ```bash | ||||||
|  | $ # ensure that the image we are using exists; it is a regular OCI image | ||||||
|  | $ ctr image pull docker.io/library/busybox:latest | ||||||
|  | $ # run the container with the provides snapshotter | ||||||
|  | $ ctr run -rm -t --snapshotter blockfile docker.io/library/busybox:latest hello sh | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | To use it via the go client API, it is identical to using any other snapshotter: | ||||||
|  |  | ||||||
|  | ```go | ||||||
|  | import ( | ||||||
|  |     "context" | ||||||
|  |     "github.com/containerd/containerd" | ||||||
|  |     "github.com/containerd/containerd/snapshots" | ||||||
|  | ) | ||||||
|  |  | ||||||
|  | // create a new client | ||||||
|  | client, err := containerd.New("/run/containerd/containerd.sock") | ||||||
|  | snapshotter := "blockfile" | ||||||
|  | cOpts := []containerd.NewContainerOpts{ | ||||||
|  | 				containerd.WithImage(image), | ||||||
|  | 				containerd.WithImageConfigLabels(image), | ||||||
|  | 				containerd.WithAdditionalContainerLabels(labels), | ||||||
|  | 				containerd.WithSnapshotter(snapshotter) | ||||||
|  | } | ||||||
|  | container, err := client.NewContainer(ctx, containerID, cOpts...) | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | ## How It Works | ||||||
|  |  | ||||||
|  | The blockfile snapshotter functions similarly to other snapshotters. | ||||||
|  | It unpacks each individual layer from a container image, with each layer unpack | ||||||
|  | building on the content from its parent(s). | ||||||
|  |  | ||||||
|  | The blockfile snapshotter is unique in two ways: | ||||||
|  |  | ||||||
|  | 1. It applies layers inside a disk image file, rather than on the host filesystem. | ||||||
|  | 1. It creates a block image file for each layer, applying the previous on top of it. | ||||||
|  |  | ||||||
|  | Rather than a single directory with the contents, the end of the blockfile | ||||||
|  | snapshotter's process is a single file, which has the contents of the full | ||||||
|  | filesystem image. That image file can be loopback mounted, or attached to a virtual | ||||||
|  | machine. | ||||||
|  |  | ||||||
|  | For every layer the snapshotter creates a new blockfile, starting with a copy of the | ||||||
|  | blockfile from the previous layer. If there is no previous layer, i.e. for the first | ||||||
|  | layer, it copies the scratch file. | ||||||
|  |  | ||||||
|  | For example, for an image with 3 layers - called A, B, C - the process is as follows: | ||||||
|  |  | ||||||
|  | 1. Layer A: | ||||||
|  |    1. Copy the scratch file to a new blockfile for layer A. | ||||||
|  |     1. Loopback-mount the blockfile for layer A. | ||||||
|  |     1. Apply layer A to the mount. | ||||||
|  |     1. Unmount the blockfile for layer A. | ||||||
|  | 1. Layer B: | ||||||
|  |     1. Copy the blockfile for layer A to a new blockfile for layer B. | ||||||
|  |     1. Loopback-mount the blockfile for layer B. | ||||||
|  |     1. Apply layer B to the mount. | ||||||
|  |     1. Unmount the blockfile for layer B. | ||||||
|  | 1. Layer C: | ||||||
|  |     1. Copy the blockfile for layer B to a new blockfile for layer C. | ||||||
|  |     1. Loopback-mount the blockfile for layer C. | ||||||
|  |     1. Apply layer C to the mount. | ||||||
|  |     1. Unmount the blockfile for layer C. | ||||||
|  |  | ||||||
|  | Each unpack of a layer builds upon the contents of the previous layers into a new | ||||||
|  | blockfile. This completes with the final blockfile containing the full filesystem | ||||||
|  | image. | ||||||
|  |  | ||||||
|  | As a result of the process, each layer leads to another blockfile in the system: | ||||||
|  |  | ||||||
|  | 1. Layer A blockfile: contents of layer A | ||||||
|  | 1. Layer B blockfile: contents of layer A + layer B | ||||||
|  | 1. Layer C blockfile: contents of layer A + layer B + layer C | ||||||
|  |  | ||||||
|  | If available in the underlying filesystem and the host OS, the process uses | ||||||
|  | sparse file support whenever available. This means that the blockfiles only take | ||||||
|  | up the space required for the actual content. | ||||||
|  |  | ||||||
|  | For example, if the scratch image is 500MB, and each layer adds 25MB, then the | ||||||
|  | file sizes will be: | ||||||
|  |  | ||||||
|  | 1. Layer A blockfile: 25MB from layer A | ||||||
|  | 1. Layer B blockfile: 50MB from layer A and B | ||||||
|  | 1. Layer C blockfile: 75MB from layer A, B, and C | ||||||
|  |  | ||||||
|  | Total space usage thus is 25+50+75=150MB. This is a fraction of the amount | ||||||
|  | required if each layer's blockfile used the full 500MB, i.e. 1500MB in total. | ||||||
		Reference in New Issue
	
	Block a user
	 Avi Deitcher
					Avi Deitcher