RunWithPrivileges() will enable privileges will lock a thread, change
privileges, and run the function passed in, within that thread. This
allows us to limit the scope in which we enable privileges and avoids
accidentally enabling privileges in threads that should never have them.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
It seems that in certain situations, like having the containerd root
and state on a file system hosted on a mounted VHDX, we need
SeSecurityPrivilege when opening a file with winio.ACCESS_SYSTEM_SECURITY.
This happens in the base layer writer in hcsshim when adding a new file.
Enabling SeSecurityPrivilege allows the containerd root to be hosted on
a vhdx.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Improve error messages
* remove a check for the existance of unmount target. We probably
should not mask that the target was missing.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
As opposed to a writable layer derived from a base layer, the volume
path of a base layer, once activated and prepared will not be a WCIFS
volume, but the actual path on disk to the snapshot. We cannot directly
mount this folder, as that would mean a client may gain access and
potentially damage important metadata files that would render the layer
unusabble.
For base layers we need to mount the Files folder which must exist in
any valid base windows-layer.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The bind filter supports bind-like mounts and volume mounts. It also
allows us to have read-only mounts.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The WCOW layer support does not support creating sandboxes with no
parent. Instead, parentless scratch layers must be layed out as a
directory containing only a directory named 'Files', and all data stored
inside 'Files'. At commit-time, this will be converted in-place into a
read-only layer suitable for use as a parent layer.
The WCOW layer support also does not deal with making read-only layers,
i.e. layers that are prepared to be parent layers, visible in a
read-only manner. A bind-mount or junction point cannot be made
read-only, so a view must instead be a small sandbox layer that we can
mount via WCOW, and discard later, to protect the layer against
accidental or deliberate modification.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.
- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.
- Add test to cri integration suite to validate expanding the rootfs size.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
It seems that something has shifted in an API, and vhd.DetachVhd is
returning "failed to open virtual disk: invalid argument" on Windows
Server LTSC 2019.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
A Scratch layer only contains a sandbox.vhdx, but to be used as a parent
layer, it must also contain the files on-disk.
Hence, we Export the layer from the sandbox.vhdx and Import it back into
itself, so that both data formats are present.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
* Currently we rely on making the UVMs sandbox.vhdx in the shim itself instead of this being
made by the snapshotter itself. This change adds a label that affects whether to create the UVMs
scratch layer in the snapshotter itself.
* Adds container scratch size customization. Before adding the computestorage calls
(vendored in with https://github.com/containerd/containerd/pull/4859) there was no way to make a containers
or UVMs scratch size less than the default (20 for containers and 10 for the UVM).
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
For LCOW currently we copy (or create) the scratch.vhdx for every single snapshot
so there ends up being a sandbox.vhdx in every directory seemingly unnecessarily. With the default scratch
size of 20GB the size on disk is about 17MB so there's a 17MB overhead per layer plus the time to copy the
file with every snapshot. Only the final sandbox.vhdx is actually used so this would be a nice little
optimization.
For WCOW we essentially do the exact same except copy the blank vhdx from the base layer.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
This also refactors the lcow and windows
snapshotters to use go-winio's utility functions for checking the
filesystem type.
Signed-off-by: Eric Hotinger <ehotinger@gmail.com>
Implements the Windows lcow differ/snapshotter responsible for managing
the creation and lifetime of lcow containers on Windows.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This changes the Windows runtime to use the snapshotter and differ
created layers, and updates the ctr commands to use the snapshotter and differ.
Signed-off-by: Darren Stahl <darst@microsoft.com>
This implements the Windows snapshotter and diff Apply function.
This allows for Windows layers to be created, and layers to be pulled
from the hub.
Signed-off-by: Darren Stahl <darst@microsoft.com>