Commit Graph

38 Commits

Author SHA1 Message Date
James Jenkins
8aa2551ce0 Move DefaultSnapshotter constants
Move the DefaultSnapshotter constants to the defaults package.
Fixes issue #8226.

Signed-off-by: James Jenkins <James.Jenkins@ibm.com>
2024-01-12 13:28:46 -05:00
Derek McGowan
3baf5edb8b
Separate the CRI image config from the main plugin config
This change simplifies the CRI plugin dependencies by not requiring the
CRI image plugin to depend on any other CRI components. Since other CRI
plugins depend on the image plugin, this allows prevents a dependency
cycle for CRI configurations on a base plugin.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-01-11 09:55:09 -08:00
Derek McGowan
02a9a456e1
Split image config from CRI plugin
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-01-11 09:55:09 -08:00
Wei Fu
23278c81fb *: introduce image_pull_with_sync_fs in CRI
It's to ensure the data integrity during unexpected power failure.

Background:

Since release 1.3, in Linux system, containerD unpacks and writes files into
overlayfs snapshot directly. It doesn’t involve any mount-umount operations
so that the performance of pulling image has been improved.

As we know, the umount syscall for overlayfs will force kernel to flush
all the dirty pages into disk. Without umount syscall, the files’ data relies
on kernel’s writeback threads or filesystem's commit setting (for
instance, ext4 filesystem).

The files in committed snapshot can be loss after unexpected power failure.
However, the snapshot has been committed and the metadata also has been
fsynced. There is data inconsistency between snapshot metadata and files
in that snapshot.

We, containerd, received several issues about data loss after unexpected
power failure.

* https://github.com/containerd/containerd/issues/5854
* https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907

Solution:

* Option 1: SyncFs after unpack

Linux platform provides [syncfs][syncfs] syscall to synchronize just the
filesystem containing a given file.

* Option 2: Fsync directories recursively and fsync on regular file

The fsync doesn't support symlink/block device/char device files. We
need to use fsync the parent directory to ensure that entry is
persisted.

However, based on [xfstest-dev][xfstest-dev], there is no case to ensure
fsync-on-parent can persist the special file's metadata, for example,
uid/gid, access mode.

Checkout [generic/690][generic/690]: Syncing parent dir can persist
symlink. But for f2fs, it needs special mount option. And it doesn't say
that uid/gid can be persisted. All the details are behind the
implemetation.

> NOTE: All the related test cases has `_flakey_drop_and_remount` in
[xfstest-dev].

Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity],
we can't rely on Fsync-on-parent.

* Option 1 is winner

This patch is using option 1.

There is test result based on [test-tool][test-tool].
All the networking traffic created by pull is local.

  * Image: docker.io/library/golang:1.19.4 (992 MiB)
    * Current: 5.446738579s
      * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056
    * Option 1: 6.239686088s
      * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056
    * Option 2: 1m30.510934813s
      * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344

  * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes)
    * Current: 8.852718042s
      * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712
    * Option 1: 9.683387174s
      * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016
    * Option 2: 1m54.302103719s
      * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568

The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is
option in CRI plugin.

[syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html>
[xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>
[generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19>
[kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/>
[test-tool]: <a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-12 10:18:39 +08:00
Wei Fu
80dd779deb remotes/docker: close connection if no more data
Close connection if no more data. It's to fix false alert filed by image
pull progress.

```
dst = OpenWriter (--> Content Store)

src = Fetch
        Open (--> Registry)
        Mark it as active request

Copy(dst, src) (--> Keep updating total received bytes)

   ^
   |  (Active Request > 0, but total received bytes won't be updated)
   v

defer src.Close()
content.Commit(dst)
```

Before migrating to transfer service, CRI plugin doesn't limit global
concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent
goroutines to download blob and 1 goroutine to unpack blob. Like ext4
filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages
into disk. The host is running under IO pressure, and then the content.Commit
will take long time and block other goroutines. If httpreadseeker
doesn't close the connection after io.EOF, this connection will be
considered as active. The pull progress reporter reports there is no
bytes transfered and cancels the ImagePull.

The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin
can't limit the total concurrent downloads, this patch is to update 1-minute
to 5-minutes to prevent from unexpected cancel.

[1]: https://lwn.net/Articles/842385/
[2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
rongfu.leng
7b9fcfd7c6 add default enable unprivileged icmp/ports
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-08 23:00:35 +08:00
Derek McGowan
261e01c2ac
Move client to subpackage
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 10:37:00 -07:00
Derek McGowan
5fdf55e493
Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Maksym Pavlenko
f90f80d9b3
Merge pull request #9254 from adisky/cri-streaming-from-k8s
Use staging k8s.io/kubelet/cri/streaming package
2023-10-19 12:32:12 -07:00
Aditi Sharma
03d81f595f Use cri streaming pkg from k8s staging
Use staging k8s.io/kubelet/cri/streaming package

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2023-10-18 09:14:28 +05:30
Abel Feng
69e501e7cd sandbox: change SandboxMode to Sandboxer
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 20:49:36 +08:00
Derek McGowan
b5615caf11
Update go-toml to v2
Updates host file parsing to use new v2 method rather than the removed
toml.Tree.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 15:35:12 -07:00
rongfu.leng
9287711b7a upgrade registry.k8s.io/pause version
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-05-28 07:59:10 +08:00
Iceber Gu
23d288a809 Remove the CriuPath field from runc's options
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2023-03-16 17:12:51 +08:00
Maksym Pavlenko
07c2ae12e1 Remove v1 runctypes
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-03-15 09:18:16 -07:00
Wei Fu
791f137a5b *: update drainExecSyncIO docs and validate the timeout
We should validate the drainExecSyncIO timeout at the beginning and
raise the error for any invalid input.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 11:58:52 +08:00
Wei Fu
3c18decea7 *: add DrainExecSyncIOTimeout config and disable as by default
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 00:21:55 +08:00
Maksym Pavlenko
3bc8fc4d30 Cleanup build constraints
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-08 09:36:20 -08:00
Fei Su
f6232793b4 can set up the network serially by CNI plugins
Signed-off-by: Fei Su <sofat1989@126.com>
2022-11-18 15:19:00 +08:00
Zhang Tianyang
c953eecb79 Sandbox API: Add a new mode config for sandbox controller impls
Add a new config as sandbox controller mod, which can be either
"podsandbox" or "shim". If empty, set it to default "podsandbox"
when CRI plugin inits.

Signed-off-by: Zhang Tianyang <burning9699@gmail.com>
2022-11-09 12:12:39 +08:00
lengrongfu
3c0e6c40ad feat: upgrade registry.k8s.io/pause version
Signed-off-by: rongfu.leng <1275177125@qq.com>
2022-09-06 15:59:20 +08:00
Paco Xu
9525b3148a migrate from k8s.gcr.io to registry.k8s.io
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-08-24 13:46:46 +08:00
Paco Xu
1cf6f20320 promote pause image to 3.7
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-05-30 15:08:28 +08:00
Wei Fu
00d102da9f feature: support image pull progress timeout
Kubelet sends the PullImage request without timeout, because the image size
is unknown and timeout is hard to defined. The pulling request might run
into 0B/s speed, if containerd can't receive any packet in that connection.
For this case, the containerd should cancel the PullImage request.

Although containerd provides ingester manager to track the progress of pulling
request, for example `ctr image pull` shows the console progress bar, it needs
more CPU resources to open/read the ingested files to get status.

In order to support progress timeout feature with lower overhead, this
patch uses http.RoundTripper wrapper to track active progress. That
wrapper will increase active-request number and return the
countingReadCloser wrapper for http.Response.Body. Each bytes-read
can be count and the active-request number will be descreased when the
countingReadCloser wrapper has been closed. For the progress tracker,
it can check the active-request number and bytes-read at intervals. If
there is no any progress, the progress tracker should cancel the
request.

NOTE: For each blob data, the containerd will make sure that the content
writer is opened before sending http request to the registry. Therefore, the
progress reporter can rely on the active-request number.

fixed: #4984

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-04-27 00:02:27 +08:00
Ed Bartosh
c9b4ccf83e add configuration for CDI
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-04-06 13:10:54 +03:00
Adelina Tuvenie
6d3d34b85d Update Pause image in tests & config
With the introduction of Windows Server 2022, some images have been updated
to support WS2022 in their manifest list. This commit updates the test images
accordingly.

Signed-off-by: Adelina Tuvenie <atuvenie@cloudbasesolutions.com>
2021-08-31 19:42:57 +03:00
Akihiro Suda
d3aa7ee9f0
Run go fmt with Go 1.17
The new `go fmt` adds `//go:build` lines (https://golang.org/doc/go1.17#tools).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-08-22 09:31:50 +09:00
Mike Brown
dd16b006e5 merge in the move to the new options type
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-08 14:09:59 -05:00
Mike Brown
9144ce9677 shows our runc.v2 default options in the containerd default config
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-08 14:09:59 -05:00
Mike Brown
d4be6aa8fa rm mirror defaults; doc registry deprecations
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-07 12:29:43 -05:00
Maksym Pavlenko
ddd4298a10 Migrate current TOML code to github.com/pelletier/go-toml
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-03-25 13:13:33 -07:00
Fu, Wei
80fa9fe32a
Merge pull request #5135 from AkihiroSuda/default-config-crypt
add imgcrypt stream processors to the default config
2021-03-25 14:31:38 +08:00
pacoxu
ffff688663 upgrade pause image to 3.5 for non-root
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-03-16 23:20:35 +08:00
Akihiro Suda
ecb881e5e6
add imgcrypt stream processors to the default config
Enable the following config by default:

```toml
version = 2

[plugins."io.containerd.grpc.v1.cri".image_decryption]
  key_model = "node"

[stream_processors]
  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
    returns = "application/vnd.oci.image.layer.v1.tar+gzip"
    path = "ctd-decoder"
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
    returns = "application/vnd.oci.image.layer.v1.tar"
    path = "ctd-decoder"
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
```

Fix issue 5128

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-15 13:27:16 +09:00
Iceber Gu
f37ae8fc35
move to v3.4.1 for the pause image
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-03-07 15:21:20 +08:00
Derek McGowan
b2642458f9
Update make snapshot annotations disabled by default
This experimental feature should not be enabled by default as
it is not used by any default snapshotters.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2020-10-27 21:32:25 -07:00
Maksym Pavlenko
3508ddd3dd Refactor CRI packages
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2020-10-07 14:45:57 -07:00
Derek McGowan
b22b627300
Move cri server packages under pkg/cri
Organizes the cri related server packages under pkg/cri

Signed-off-by: Derek McGowan <derek@mcg.dev>
2020-10-07 13:09:37 -07:00