Commit Graph

17 Commits

Author SHA1 Message Date
Wei Fu
23278c81fb *: introduce image_pull_with_sync_fs in CRI
It's to ensure the data integrity during unexpected power failure.

Background:

Since release 1.3, in Linux system, containerD unpacks and writes files into
overlayfs snapshot directly. It doesn’t involve any mount-umount operations
so that the performance of pulling image has been improved.

As we know, the umount syscall for overlayfs will force kernel to flush
all the dirty pages into disk. Without umount syscall, the files’ data relies
on kernel’s writeback threads or filesystem's commit setting (for
instance, ext4 filesystem).

The files in committed snapshot can be loss after unexpected power failure.
However, the snapshot has been committed and the metadata also has been
fsynced. There is data inconsistency between snapshot metadata and files
in that snapshot.

We, containerd, received several issues about data loss after unexpected
power failure.

* https://github.com/containerd/containerd/issues/5854
* https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907

Solution:

* Option 1: SyncFs after unpack

Linux platform provides [syncfs][syncfs] syscall to synchronize just the
filesystem containing a given file.

* Option 2: Fsync directories recursively and fsync on regular file

The fsync doesn't support symlink/block device/char device files. We
need to use fsync the parent directory to ensure that entry is
persisted.

However, based on [xfstest-dev][xfstest-dev], there is no case to ensure
fsync-on-parent can persist the special file's metadata, for example,
uid/gid, access mode.

Checkout [generic/690][generic/690]: Syncing parent dir can persist
symlink. But for f2fs, it needs special mount option. And it doesn't say
that uid/gid can be persisted. All the details are behind the
implemetation.

> NOTE: All the related test cases has `_flakey_drop_and_remount` in
[xfstest-dev].

Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity],
we can't rely on Fsync-on-parent.

* Option 1 is winner

This patch is using option 1.

There is test result based on [test-tool][test-tool].
All the networking traffic created by pull is local.

  * Image: docker.io/library/golang:1.19.4 (992 MiB)
    * Current: 5.446738579s
      * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056
    * Option 1: 6.239686088s
      * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056
    * Option 2: 1m30.510934813s
      * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344

  * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes)
    * Current: 8.852718042s
      * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712
    * Option 1: 9.683387174s
      * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016
    * Option 2: 1m54.302103719s
      * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568

The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is
option in CRI plugin.

[syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html>
[xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>
[generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19>
[kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/>
[test-tool]: <a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-12 10:18:39 +08:00
Derek McGowan
5fdf55e493
Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Akihiro Suda
5dedb6d0d2
archive: use 1970-01-01 as the whiteout timestamp
The whiteout timestamps are no longer set to the source date epoch.
The source date epoch still applies to non-whiteout files.

Discussion happened in moby/buildkit PR 3560.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-06-30 11:30:01 +09:00
Akihiro Suda
b61988670c
go.mod: github.com/containerd/typeurl/v2 v2.1.0
Changes: https://github.com/containerd/typeurl/compare/7f6e6d160d67...v2.1.0

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-02-11 23:39:52 +09:00
Akihiro Suda
70fbedc217
archive: add WithSourceDateEpoch() for whiteouts
This makes diff archives to be reproducible.

The value is expected to be passed from CLI applications via the $SOUCE_DATE_EPOCH env var.

See https://reproducible-builds.org/docs/source-date-epoch/
for the $SOURCE_DATE_EPOCH specification.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-10-08 08:45:03 +09:00
Kazuyoshi Kato
dfa6e8763e diff: hide types.Any from clients
This commit hides types.Any from the diff package's interface. Clients
(incl. imgcrypt) shouldn't aware about gogo/protobuf.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-21 13:43:20 +00:00
Kazuyoshi Kato
88c0c7201e Consolidate gogo/protobuf dependencies under our own protobuf package
This would make gogo/protobuf migration easier.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-19 15:53:36 +00:00
ktock
b483177ee2 Support custom compressor for walking differ
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2021-07-20 11:00:01 +09:00
Michael Crosby
3fded74bc7 Add unpack opts
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-07 18:35:55 +00:00
Michael Crosby
f867401c69 Use fds and pass Payloads over diff api
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-07 18:35:55 +00:00
Michael Crosby
366823727f Add server config for stream processors
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-07 18:35:55 +00:00
Michael Crosby
97a98773cf Add StreamProcessor for apply
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-08-07 18:35:55 +00:00
Stefan Berger
baf3403439 Extend Applier's Apply() method with an optional options parameter
Extend the Applier interface's Apply method with an optional
options parameter.

For the container image encryption we intend to use the options
parameter to pass image decryption parameters ('dcparameters'),
which are primarily (privatte) keys, in form of a JSON document
under the map key '_dcparameters', and pass them to the Applier's
Apply() method. This helps us to access decryption keys and start
the pipeline with the layer decryption before the layer data are
unzipped and untarred.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
2019-04-02 18:19:48 -04:00
Kunal Kushwaha
b12c3215a0 Licence header added
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
2018-02-19 10:32:26 +09:00
Derek McGowan
b763777288
diff: rename differ to comparer
Remove combined interface and split implementations.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2018-01-26 16:32:09 -08:00
Akihiro Suda
b580441f91
diff: resplit Applier from Differ
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-01-25 13:58:34 -08:00
Derek McGowan
d9db1d112d
Refactor differ into separate package
Add differ options and package with interface.
Update optional values on diff interface to use options.

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-10-11 10:02:29 -07:00