diff --git a/design/data-flow.md b/design/data-flow.md new file mode 100644 index 000000000..3fa265482 --- /dev/null +++ b/design/data-flow.md @@ -0,0 +1,105 @@ +# Data Flow + +In the past, container systems have hidden the complexity of pulling container +images, hiding many details and complexity. This document intends to shed light +on that complexity and detail how a "pull" operation will look from the +perspective of a containerd user. We use the _bundle_ as the target object in +this workflow, and walk back from there to describe the full process. In this +context, we describe bothing pulling an image and creating a bundle from that +image. + +With containerd, we redefine the "pull" to comprise the same set of steps +encompassed in prior container engines. In this model, an image defines a +collection of resources that can be used to create a _bundle_. There is no +specific format or object called an image. The goal of the pull is to produce a +set of steps is to resolve the resources that comprise an image, with the +separation providing lifecycle points in the process. + +A reference implementation of the complete "pull", performed client-side, will +be provided as part of containerd, but there may not be a single "pull" API +call. + +A rough diagram of the dataflow, along with the relevant components, is below. + +![Data Flow](data-flow.png) + +While the process proceeds left to right in the diagram, this document is +written right to left. By working through this process backwards, we can best +understand the approach employed by containerd. + +## Running a Container + +For containerd, we'd generally like to retrieve a _bundle_. This is the +runtime, on-disk container layout, which includes the filesystem and +configuration required to run the container. + +Generically, speaking, we can say we have the following directory: + +``` +config.json +rootfs/ +``` + +The contents of `config.json` isn't interesting in this context, but for +clarity, it may be the runc config or a containerd specific configuration file +for setting up a running container. The `rootfs` is a directory where +containerd will setup the runtime container's filesystem. + +While containerd doesn't have the concept of an image, we can effectively build +this structure from an image, as projected into containerd. Given this, we can +say that are requirements for running a container are to do the following: + +1. Convert the configuration from the container image into the target format + for the containerd runtime. +2. Reproduce the root filesystem from the container image. While we could + unpack this into `rootfs` in the bundle, we can also just pass this as a set + of mounts to the container configuration. + +The above defines the framework in which we will operate. Put differently, we +can say that we want to create a bundle by creating these two components of a +bundle. + +## Creating a Bundle + +Now that we've defined what is required to run a container, a _bundle_, we need +to create one. + +Let's say we have the following: + +``` +ctr run ubuntu +``` + +This does no pulling of images. It only takes the name and creates a _bundle_. +Broken down into steps, the process looks as follows: + +1. Lookup the digest of the image in metadata store. +2. Resolve the manifest in the content store. +3. Resolve the layer snapshots in the snapshot subsystem. +4. Transform the config into the target bundle format. +5. Create a runtime snapshot for the rootfs of the container, including resolution of mounts. +6. Run the container. + +From this, we can understand the required resources to _pull_ an image: + +1. An entry in the metadata store a name pointing at a particular digest. +2. The manifest must be available in the content store. +3. The result of successively applied layers must be available as a snapshot. + +## Unpacking Layers + +While this process may be pull or run driven, the idea is quite simple. For +each layer, apply the result to a snapshot of the previous layer. The result +should be stored under the chain id (as defined by OCI) of the resulting +application. + +## Pulling an Image + +With all the above defined, pulling an image simply becomes the following: + +1. Fetch the manifest for the image, verify and store it. +2. Fetch each layer of the image manifest, verify and store them. +3. Store the manifest digest under the provided name. + +Note that we leave off using the name to resolve a particular location. We'll +leave that for another doc!