Uses the new github.com/containerd/errdefs/pkg module which is intended
to hold less stable utility functions separately from the stable
github.com/containerd/errdefs error types.
Includes temporary update to hcsshim until a release is cut there
Signed-off-by: Derek McGowan <derek@mcg.dev>
When an upstream client (e.g. kubelet) stops or restarts, the CRI
connection to the containerd gets interrupted which is treated as a
cancellation of context which subsequently cancels an ongoing operation,
including an image pull. This generally gets followed by containerd's
GC routine that tries to delete the prepared snapshots for the image
layer(s) corresponding to the image in the pull operation that got
cancelled. However, if the upstream client immediately retries (or
starts a new) image pull operation, containerd initiates a new image
pull and starts unpacking the image layers into snapshots. This may
create a race condition: the GC routine (corresponding to the failed
image pull operation) trying to clean up the same snapshot that the new
image pull operation is preparing, thus leading to the "parent snapshot
does not exist: not found" error.
Race Condition Scenario:
Assume an image consisting of 2 layers (L1 and L2, L1 being the bottom
layer) that are supposed to get unpacked into snapshots S1 and S2
respectively.
During an image pull operation, containerd unpacks(L1) which involves
Stat()'ing the chainID. This Stat() fails as the chainID does not
exist and Prepare(L1) gets called. Once S1 gets prepared, containerd
processes L2 - unpack(L2) which again involves Stat()'ing the chainID
which fails as the chainID for S2 does not exist which results in the
call to Prepare(L2). However, if the image pull operation gets
cancelled before Prepare(L2) is called, then the GC routine tries to
clean up S1.
When the image pull operation is retried by the upstream client,
containerd follows the same series of operations. unpack(L1) gets
called which then calls Stat(chainID) for L1. However, this time,
Stat(L1) succedes as S1 already exists (from the previous image pull
operation) and thus containerd goes to the next iteration to
unpack(L2). Now, GC cleans up S1 and when Prepare(L2) gets called, it
returns back the "parent snapshot does not exist: not found" error.
Fix:
Removing the "Stat() + early return" fixes the race condition. Now
during the image pull operation corresponding to the client retry,
although the chainID (for L1) already exists, containerd does not
return early and goes on to Prepare(L1). Since L1 is already prepared,
it adds a new lease to S1 and then returns `ErrAlreadyExists`. This
new lease prevents GC from cleaning up S1 when containerd processes
L2 (unpack(L2) -> Prepare(L2)).
Fixes: https://github.com/containerd/containerd/issues/3787
Signed-off-by: Saket Jajoo <saketjajoo@google.com>
Core should not have a dependency on API types.
This was causing a transative dependency on grpc when importing the core
snapshots package.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This adds trace context propagation over the grpc/ttrpc calls to a shim.
It also adds the otlp plugin to the runc shim so that it will send
traces to the configured tracer (which is inherited from containerd's
config).
It doesn't look like this is adding any real overhead to the runc shim's
memory usage, however it does add 2MB to the binary size.
As such this is gated by a build tag `shim_tracing`
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The detached mount is less likely to fail in our case, but if we see any
failure to unmount, we should just skip the removal of directories.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Using os.RemoveAll() is quite risky, as if the unmount failed and we
can delete files from the container rootfs. In fact, we were doing just
that.
Let's use os.Remove() to make sure we only deleted empty dirs.
Big kudos to @mbaynton for reporting this issue with lot of details,
nailing it down to containerd lines of code and showing all the log
lines to understand the big picture.
Fixes: #10704
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Overlayfs needs to do an idmap mount of each layer and the cleanup
function just unmounts and deletes the directories. However, when the
resource is busy, the umount fails.
Let's make the unmount detached so the unmount will eventually be done
when it's not busy anymore. Also, making it detached solves the issues with
the unmount failing because it is busy.
Big kudos to @mbaynton for reporting this issue with lot of details,
nailing it down to containerd lines of code and showing all the log
lines to understand the big picture.
Fixes: #10704
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
After these changes, in order to add Darwin bind-mount implementation, one only needs:
* Adjust HasBindMounts definition in mount.go
* Provide implementation in mount_darwin.go
There was no consensus on adding dependency on bindfs, that seems to be the only working solution for bind-mounts on Darwin as of today, in https://github.com/containerd/containerd/pull/8789, that's why the actual implementation is not added in current PR.
As a bonus, Linux FUSE-related code was moved to a separate file and possibly could be reused on FreeBSD, though this needs testing.
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
As per https://github.com/golang/go/issues/60529, printf like commands with
non-constant format strings and no args give an error in govet
Signed-off-by: Akhil Mohan <akhilerm@gmail.com>
Commit 8437c567d8 migrated the use of the
userns package to the github.com/moby/sys/user module.
After further discussion with maintainers, it was decided to move the
userns package to a separate module, as it has no direct relation with
"user" operations (other than having "user" in its name).
This patch migrates our code to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The userns package in libcontainer was integrated into the moby/sys/user
module at commit [3778ae603c706494fd1e2c2faf83b406e38d687d][1].
This patch deprecates the containerd fork of that package, and adds it as
an alias for the moby/sys/user/userns package.
[1]: 3778ae603c
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There's a couple spots where we know exactly how large
the destination buffer should be, so pre-size these to
avoid any reallocs to a higher capacity.
Signed-off-by: Danny Canter <danny@dcantah.dev>
The boltdb instance in metadata is only used for getting transactions
and can also be overriden via the context to have a wider control of the
transaction boundary. Using the transactor interface allows callers of
metadata to have more control of the transaction lifecycle.
Since boltdb must be fsync'ed on commit, operations which perform many
database operations can be costly and slow. While providing transactor
via context can be used to group together operations, it does not
provide a way to manage the commit fsyncs more globally.
Signed-off-by: Derek McGowan <derek@mcg.dev>
The ParentIDs array in the Snapshot type is populated in the reverse order i.e the
immediate parent is at the 0th index and the oldest parent is at the last index. It can be
seen here:
https://github.com/containerd/containerd/blob/main/core/snapshots/storage/bolt.go#L492
When applying these layers, the parent layer at the last index should be applied first and
the parent layer at the 0th index should be applied last. However, the comment above the
Snapshot type says the exact opposite thing. This commit fixes that comment.
Signed-off-by: Amit Barve <ambarve@microsoft.com>
When no port is specified, allow falling back from 443 to 80 when
http is specified along with a TLS configuration.
Signed-off-by: Derek McGowan <derek@mcg.dev>