Commit Graph

98 Commits

Author SHA1 Message Date
Akihiro Suda
a83449cf11 Merge pull request #9621 from bart0sh/PR011-enable-CDI-by-default
config: enable CDI by default
2024-01-17 00:48:55 +00:00
James Jenkins
8aa2551ce0 Move DefaultSnapshotter constants
Move the DefaultSnapshotter constants to the defaults package.
Fixes issue #8226.

Signed-off-by: James Jenkins <James.Jenkins@ibm.com>
2024-01-12 13:28:46 -05:00
Ed Bartosh
c8e8a093ce config: enable CDI by default
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2024-01-12 09:31:39 +02:00
Derek McGowan
3baf5edb8b Separate the CRI image config from the main plugin config
This change simplifies the CRI plugin dependencies by not requiring the
CRI image plugin to depend on any other CRI components. Since other CRI
plugins depend on the image plugin, this allows prevents a dependency
cycle for CRI configurations on a base plugin.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-01-11 09:55:09 -08:00
Derek McGowan
ad4c9f8a9d Update CRI runtime platform and pinned image configuration
Updates the CRI image service to own image related configuration and
separate it from the runtime configuration.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-01-11 09:55:09 -08:00
Derek McGowan
02a9a456e1 Split image config from CRI plugin
Signed-off-by: Derek McGowan <derek@mcg.dev>
2024-01-11 09:55:09 -08:00
Derek McGowan
c0e94bc9a8 Merge pull request #9492 from kiashok/moveCriConfig
Move GenerateRuntimeOptions() to pkg/cri/config
2023-12-13 00:13:46 +00:00
Wei Fu
23278c81fb *: introduce image_pull_with_sync_fs in CRI
It's to ensure the data integrity during unexpected power failure.

Background:

Since release 1.3, in Linux system, containerD unpacks and writes files into
overlayfs snapshot directly. It doesn’t involve any mount-umount operations
so that the performance of pulling image has been improved.

As we know, the umount syscall for overlayfs will force kernel to flush
all the dirty pages into disk. Without umount syscall, the files’ data relies
on kernel’s writeback threads or filesystem's commit setting (for
instance, ext4 filesystem).

The files in committed snapshot can be loss after unexpected power failure.
However, the snapshot has been committed and the metadata also has been
fsynced. There is data inconsistency between snapshot metadata and files
in that snapshot.

We, containerd, received several issues about data loss after unexpected
power failure.

* https://github.com/containerd/containerd/issues/5854
* https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907

Solution:

* Option 1: SyncFs after unpack

Linux platform provides [syncfs][syncfs] syscall to synchronize just the
filesystem containing a given file.

* Option 2: Fsync directories recursively and fsync on regular file

The fsync doesn't support symlink/block device/char device files. We
need to use fsync the parent directory to ensure that entry is
persisted.

However, based on [xfstest-dev][xfstest-dev], there is no case to ensure
fsync-on-parent can persist the special file's metadata, for example,
uid/gid, access mode.

Checkout [generic/690][generic/690]: Syncing parent dir can persist
symlink. But for f2fs, it needs special mount option. And it doesn't say
that uid/gid can be persisted. All the details are behind the
implemetation.

> NOTE: All the related test cases has `_flakey_drop_and_remount` in
[xfstest-dev].

Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity],
we can't rely on Fsync-on-parent.

* Option 1 is winner

This patch is using option 1.

There is test result based on [test-tool][test-tool].
All the networking traffic created by pull is local.

  * Image: docker.io/library/golang:1.19.4 (992 MiB)
    * Current: 5.446738579s
      * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056
    * Option 1: 6.239686088s
      * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056
    * Option 2: 1m30.510934813s
      * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344

  * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes)
    * Current: 8.852718042s
      * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712
    * Option 1: 9.683387174s
      * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016
    * Option 2: 1m54.302103719s
      * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568

The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is
option in CRI plugin.

[syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html>
[xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>
[generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19>
[kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/>
[test-tool]: <a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-12 10:18:39 +08:00
Kirtana Ashok
25b052cbca Move GenerateRuntimeOptions() to pkg/cri/config
Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
2023-12-07 12:23:00 -08:00
Wei Fu
80dd779deb remotes/docker: close connection if no more data
Close connection if no more data. It's to fix false alert filed by image
pull progress.

```
dst = OpenWriter (--> Content Store)

src = Fetch
        Open (--> Registry)
        Mark it as active request

Copy(dst, src) (--> Keep updating total received bytes)

   ^
   |  (Active Request > 0, but total received bytes won't be updated)
   v

defer src.Close()
content.Commit(dst)
```

Before migrating to transfer service, CRI plugin doesn't limit global
concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent
goroutines to download blob and 1 goroutine to unpack blob. Like ext4
filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages
into disk. The host is running under IO pressure, and then the content.Commit
will take long time and block other goroutines. If httpreadseeker
doesn't close the connection after io.EOF, this connection will be
considered as active. The pull progress reporter reports there is no
bytes transfered and cancels the ImagePull.

The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin
can't limit the total concurrent downloads, this patch is to update 1-minute
to 5-minutes to prevent from unexpected cancel.

[1]: https://lwn.net/Articles/842385/
[2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
Abel Feng
32bf805e57 sandbox: add a sandboxService interface to criService
so that we can add a fakeSandboxService to the criService in tests.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:25:58 +08:00
rongfu.leng
7b9fcfd7c6 add default enable unprivileged icmp/ports
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-08 23:00:35 +08:00
rongfu.leng
e099717f9f validate kernel version for unprivileged icmp/port
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-06 23:50:12 +08:00
Samuel Karp
a596d09ec9 cri: add deprecation warning for configs
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:32 -07:00
Samuel Karp
35924bccc0 cri: add deprecation warning for auths
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:32 -07:00
Samuel Karp
d7cb25d770 cri: add deprecation warning for mirrors
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:31 -07:00
Samuel Karp
58cc275eb8 cri: add ability to emit deprecation warnings
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:31 -07:00
Derek McGowan
261e01c2ac Move client to subpackage
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 10:37:00 -07:00
Derek McGowan
5fdf55e493 Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Maksym Pavlenko
f90f80d9b3 Merge pull request #9254 from adisky/cri-streaming-from-k8s
Use staging k8s.io/kubelet/cri/streaming package
2023-10-19 12:32:12 -07:00
Aditi Sharma
03d81f595f Use cri streaming pkg from k8s staging
Use staging k8s.io/kubelet/cri/streaming package

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2023-10-18 09:14:28 +05:30
Abel Feng
69e501e7cd sandbox: change SandboxMode to Sandboxer
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 20:49:36 +08:00
Derek McGowan
b5615caf11 Update go-toml to v2
Updates host file parsing to use new v2 method rather than the removed
toml.Tree.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 15:35:12 -07:00
Derek McGowan
508aa3a1ef Move to use github.com/containerd/log
Add github.com/containerd/log to go.mod

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-09-22 07:53:23 -07:00
Maksym Pavlenko
c3f3cad287 Use sandboxed CRI by default
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-08-23 08:50:40 -07:00
Akihiro Suda
69b451af5a RELEASES.md: de-deprecation of CNI conf_template will be v1.7.3
Cherry-pick of PR 8606 missed the v1.7.2 milestone

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-06-03 17:04:14 +09:00
Kazuyoshi Kato
73645b1dfe Merge pull request #8588 from lengrongfu/feat/cleanup_config_tls
Cleanup DEPRECATED TLS config
2023-05-31 18:50:54 -07:00
Kazuyoshi Kato
3ad032e9d0 Merge pull request #8606 from adisky/remove-conf-template-deprecation
Remove cni conf_template deprecation
2023-05-31 09:47:21 -07:00
Aditi Sharma
3ca5b4437e Remove cni conf_template deprecation
As discussed in the issue
https://github.com/containerd/containerd/issues/8596
It is a helpful feature at many places and no replacement
readily available

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2023-05-31 17:34:33 +05:30
rongfu.leng
d2b7a1e293 cleanup DEPRECATED TLS config
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-05-31 09:37:41 +08:00
rongfu.leng
314d758fa1 update auths code comment
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-05-30 23:05:48 +08:00
rongfu.leng
9287711b7a upgrade registry.k8s.io/pause version
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-05-28 07:59:10 +08:00
Wei Fu
d280cb83b6 chore: update comment for NetworkPluginSetupSerially
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-05-17 22:39:10 +08:00
Iceber Gu
23d288a809 Remove the CriuPath field from runc's options
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2023-03-16 17:12:51 +08:00
Maksym Pavlenko
8bd82e355a Remove no_pivot when creating container from CRI
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-03-15 09:18:16 -07:00
Maksym Pavlenko
07c2ae12e1 Remove v1 runctypes
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-03-15 09:18:16 -07:00
Akihiro Suda
625217d5fb RELEASES.md: describe the deprecated config properties
These deprecations were mentioned in `pkg/cri/config/config.go`
but not mentioned in `RELEASES.md`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-09 15:12:54 +09:00
Wei Fu
5946c1051e *: fix code style issue
1. it's easy to check wrong input if using drain_exec_sync_io_timeout in error
2. avoid to use full error message, as part of error generated by go
   stdlib would be changed in the future
3. delete the extra empty line

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 17:51:03 +08:00
Wei Fu
01671e9fc5 cri: add config ut for invalid drain io timeout value
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 12:00:19 +08:00
Wei Fu
791f137a5b *: update drainExecSyncIO docs and validate the timeout
We should validate the drainExecSyncIO timeout at the beginning and
raise the error for any invalid input.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 11:58:52 +08:00
Wei Fu
3c18decea7 *: add DrainExecSyncIOTimeout config and disable as by default
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-03-03 00:21:55 +08:00
Kay Yan
0b33a45fad cri: fix Mirrors deprecation comment
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2023-02-07 09:53:57 +08:00
Maksym Pavlenko
3bc8fc4d30 Cleanup build constraints
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-08 09:36:20 -08:00
Fei Su
f6232793b4 can set up the network serially by CNI plugins
Signed-off-by: Fei Su <sofat1989@126.com>
2022-11-18 15:19:00 +08:00
Zhang Tianyang
c953eecb79 Sandbox API: Add a new mode config for sandbox controller impls
Add a new config as sandbox controller mod, which can be either
"podsandbox" or "shim". If empty, set it to default "podsandbox"
when CRI plugin inits.

Signed-off-by: Zhang Tianyang <burning9699@gmail.com>
2022-11-09 12:12:39 +08:00
Ed Bartosh
e22a7a3833 reference CDI configuration details
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-09-21 11:25:28 +03:00
Kevin Parsons
de509c0682 Merge pull request #6901 from dcantah/add-wcowhyp-runtime
windows: Add runhcs-wcow-hypervisor runtimeclass to the default config
2022-09-08 10:53:12 -07:00
lengrongfu
3c0e6c40ad feat: upgrade registry.k8s.io/pause version
Signed-off-by: rongfu.leng <1275177125@qq.com>
2022-09-06 15:59:20 +08:00
Paco Xu
9525b3148a migrate from k8s.gcr.io to registry.k8s.io
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-08-24 13:46:46 +08:00
Daniel Canter
f0036cb9dc windows: Add runhcs-wcow-hypervisor runtimeclass to the default config
As part of the effort of getting hypervisor isolated windows container
support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor
handler for the default config. This sets the correct SandboxIsolation
value that the Windows shim uses to differentiate process vs. hypervisor
isolation. This change additionally sets the wcow-process runtime to
passthrough io.microsoft.container* annotations and the hypervisor runtime
to accept io.microsoft.virtualmachine* annotations.

Note that for K8s users this runtime handler will need to be configured by
creating the corresponding RuntimeClass resources on the cluster as it's
not the default runtime.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-08-19 07:56:43 -07:00