containerd

Author	SHA1	Message	Date
Derek McGowan	8f0eb26311	Move tracing to pkg/tracing Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:56:25 -08:00
Derek McGowan	44a836c9b5	Move errdefs to pkg/errdefs Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:54:45 -08:00
Derek McGowan	0dabf6f154	Move remotes to core/remotes Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:52:21 -08:00
Derek McGowan	57ea8aef3d	Move images to core/images Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:51:26 -08:00
Derek McGowan	913edcd489	Move diff to core/diff Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-17 09:51:17 -08:00
Derek McGowan	ad4c9f8a9d	Update CRI runtime platform and pinned image configuration Updates the CRI image service to own image related configuration and separate it from the runtime configuration. Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-11 09:55:09 -08:00
Derek McGowan	02a9a456e1	Split image config from CRI plugin Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-11 09:55:09 -08:00
Derek McGowan	d23ac1122e	Split CRI image service from GRPC handler Prepares the CRI image service for splitting CRI into multiple plugins. Also prepares for config migration which will spread across multiple different plugins. Signed-off-by: Derek McGowan <derek@mcg.dev>	2024-01-11 09:55:08 -08:00
Wei Fu	23278c81fb	: introduce image_pull_with_sync_fs in CRI It's to ensure the data integrity during unexpected power failure. Background: Since release 1.3, in Linux system, containerD unpacks and writes files into overlayfs snapshot directly. It doesn’t involve any mount-umount operations so that the performance of pulling image has been improved. As we know, the umount syscall for overlayfs will force kernel to flush all the dirty pages into disk. Without umount syscall, the files’ data relies on kernel’s writeback threads or filesystem's commit setting (for instance, ext4 filesystem). The files in committed snapshot can be loss after unexpected power failure. However, the snapshot has been committed and the metadata also has been fsynced. There is data inconsistency between snapshot metadata and files in that snapshot. We, containerd, received several issues about data loss after unexpected power failure. https://github.com/containerd/containerd/issues/5854 * https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907 Solution: * Option 1: SyncFs after unpack Linux platform provides [syncfs][syncfs] syscall to synchronize just the filesystem containing a given file. * Option 2: Fsync directories recursively and fsync on regular file The fsync doesn't support symlink/block device/char device files. We need to use fsync the parent directory to ensure that entry is persisted. However, based on [xfstest-dev][xfstest-dev], there is no case to ensure fsync-on-parent can persist the special file's metadata, for example, uid/gid, access mode. Checkout [generic/690][generic/690]: Syncing parent dir can persist symlink. But for f2fs, it needs special mount option. And it doesn't say that uid/gid can be persisted. All the details are behind the implemetation. > NOTE: All the related test cases has `_flakey_drop_and_remount` in [xfstest-dev]. Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity], we can't rely on Fsync-on-parent. * Option 1 is winner This patch is using option 1. There is test result based on [test-tool][test-tool]. All the networking traffic created by pull is local. * Image: docker.io/library/golang:1.19.4 (992 MiB) * Current: 5.446738579s * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056 * Option 1: 6.239686088s * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056 * Option 2: 1m30.510934813s * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344 * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes) * Current: 8.852718042s * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712 * Option 1: 9.683387174s * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016 * Option 2: 1m54.302103719s * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568 The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is option in CRI plugin. [syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html> [xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git> [generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19> [kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/> [test-tool]: <`a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)`> Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-12-12 10:18:39 +08:00
Wei Fu	80dd779deb	remotes/docker: close connection if no more data Close connection if no more data. It's to fix false alert filed by image pull progress. ``` dst = OpenWriter (--> Content Store) src = Fetch Open (--> Registry) Mark it as active request Copy(dst, src) (--> Keep updating total received bytes) ^ \| (Active Request > 0, but total received bytes won't be updated) v defer src.Close() content.Commit(dst) ``` Before migrating to transfer service, CRI plugin doesn't limit global concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent goroutines to download blob and 1 goroutine to unpack blob. Like ext4 filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages into disk. The host is running under IO pressure, and then the content.Commit will take long time and block other goroutines. If httpreadseeker doesn't close the connection after io.EOF, this connection will be considered as active. The pull progress reporter reports there is no bytes transfered and cancels the ImagePull. The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin can't limit the total concurrent downloads, this patch is to update 1-minute to 5-minutes to prevent from unexpected cancel. [1]: https://lwn.net/Articles/842385/ [2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-11-18 10:23:05 +08:00
Wei Fu	2e9686c054	fix: deflake TestCRIImagePullTimeout/HoldingContentOpenWriter The new active request is filed and there is no bytes read yet when the progress reporter just wakes up. If the timeout / 2 is less than the minPullProgressReportInternal, it's easy to file false alert. We should remove the minPullProgressReportInternal limit. Fixes: #8024 Signed-off-by: Wei Fu <fuweid89@gmail.com>	2023-11-18 10:23:05 +08:00
Abel Feng	32bf805e57	sandbox: add a sandboxService interface to criService so that we can add a fakeSandboxService to the criService in tests. Signed-off-by: Abel Feng <fshb1988@gmail.com>	2023-11-15 09:25:58 +08:00
Derek McGowan	261e01c2ac	Move client to subpackage Signed-off-by: Derek McGowan <derek@mcg.dev>	2023-11-01 10:37:00 -07:00
Derek McGowan	5fdf55e493	Update go module to github.com/containerd/containerd/v2 Signed-off-by: Derek McGowan <derek@mcg.dev>	2023-10-29 20:52:21 -07:00
Derek McGowan	638b474c81	Temporarily remove imgcrypt in CRI to fix circular dependency Signed-off-by: Derek McGowan <derek@mcg.dev>	2023-10-27 15:36:47 -07:00
Maksym Pavlenko	1b31993240	Rename sbserver to server Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>	2023-10-12 15:46:57 -07:00

16 Commits