Commit Graph

425 Commits

Author SHA1 Message Date
Derek McGowan
c0e94bc9a8
Merge pull request #9492 from kiashok/moveCriConfig
Move GenerateRuntimeOptions() to pkg/cri/config
2023-12-13 00:13:46 +00:00
Wei Fu
23278c81fb *: introduce image_pull_with_sync_fs in CRI
It's to ensure the data integrity during unexpected power failure.

Background:

Since release 1.3, in Linux system, containerD unpacks and writes files into
overlayfs snapshot directly. It doesn’t involve any mount-umount operations
so that the performance of pulling image has been improved.

As we know, the umount syscall for overlayfs will force kernel to flush
all the dirty pages into disk. Without umount syscall, the files’ data relies
on kernel’s writeback threads or filesystem's commit setting (for
instance, ext4 filesystem).

The files in committed snapshot can be loss after unexpected power failure.
However, the snapshot has been committed and the metadata also has been
fsynced. There is data inconsistency between snapshot metadata and files
in that snapshot.

We, containerd, received several issues about data loss after unexpected
power failure.

* https://github.com/containerd/containerd/issues/5854
* https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907

Solution:

* Option 1: SyncFs after unpack

Linux platform provides [syncfs][syncfs] syscall to synchronize just the
filesystem containing a given file.

* Option 2: Fsync directories recursively and fsync on regular file

The fsync doesn't support symlink/block device/char device files. We
need to use fsync the parent directory to ensure that entry is
persisted.

However, based on [xfstest-dev][xfstest-dev], there is no case to ensure
fsync-on-parent can persist the special file's metadata, for example,
uid/gid, access mode.

Checkout [generic/690][generic/690]: Syncing parent dir can persist
symlink. But for f2fs, it needs special mount option. And it doesn't say
that uid/gid can be persisted. All the details are behind the
implemetation.

> NOTE: All the related test cases has `_flakey_drop_and_remount` in
[xfstest-dev].

Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity],
we can't rely on Fsync-on-parent.

* Option 1 is winner

This patch is using option 1.

There is test result based on [test-tool][test-tool].
All the networking traffic created by pull is local.

  * Image: docker.io/library/golang:1.19.4 (992 MiB)
    * Current: 5.446738579s
      * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056
    * Option 1: 6.239686088s
      * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056
    * Option 2: 1m30.510934813s
      * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344

  * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes)
    * Current: 8.852718042s
      * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712
    * Option 1: 9.683387174s
      * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016
    * Option 2: 1m54.302103719s
      * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568

The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is
option in CRI plugin.

[syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html>
[xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>
[generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19>
[kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/>
[test-tool]: <a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-12 10:18:39 +08:00
Kirtana Ashok
25b052cbca Move GenerateRuntimeOptions() to pkg/cri/config
Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
2023-12-07 12:23:00 -08:00
Maksym Pavlenko
47163c3c00
Merge pull request #9391 from abel-von/sandbox-plugin-1117
sandbox: Move CRI Image Service and CRI Base seperate plugins
2023-12-01 22:42:22 +00:00
Abel Feng
c0363754fb sandbox: get runtime info from sandbox or container
For backward compatibility, we should get runtimeInfo from sandbox in
db, or get it from the sandbox container in db.
Note that this is a temporary solution and we will remove the Container field in
Sandbox in cri cache, and replace it with a SandboxInsantance of type
containerd.Sandbox interface.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-12-01 15:58:27 +08:00
Abel Feng
e1b4958663 sandbox: keep rootDir and stateDir compatible
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:53 +08:00
Abel Feng
b0fef6738f sandbox: migrate sandbox_mode to sandboxer
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:53 +08:00
Abel Feng
c8012b6d74 sandbox: make a clear dependency of cri plugins
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:41 +08:00
Maksym Pavlenko
e15c246550 Move CRI image service into a separate plugin
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-20 09:41:27 +08:00
Wei Fu
80dd779deb remotes/docker: close connection if no more data
Close connection if no more data. It's to fix false alert filed by image
pull progress.

```
dst = OpenWriter (--> Content Store)

src = Fetch
        Open (--> Registry)
        Mark it as active request

Copy(dst, src) (--> Keep updating total received bytes)

   ^
   |  (Active Request > 0, but total received bytes won't be updated)
   v

defer src.Close()
content.Commit(dst)
```

Before migrating to transfer service, CRI plugin doesn't limit global
concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent
goroutines to download blob and 1 goroutine to unpack blob. Like ext4
filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages
into disk. The host is running under IO pressure, and then the content.Commit
will take long time and block other goroutines. If httpreadseeker
doesn't close the connection after io.EOF, this connection will be
considered as active. The pull progress reporter reports there is no
bytes transfered and cancels the ImagePull.

The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin
can't limit the total concurrent downloads, this patch is to update 1-minute
to 5-minutes to prevent from unexpected cancel.

[1]: https://lwn.net/Articles/842385/
[2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
Wei Fu
2e9686c054 fix: deflake TestCRIImagePullTimeout/HoldingContentOpenWriter
The new active request is filed and there is no bytes read yet when the
progress reporter just wakes up. If the timeout / 2 is less than the
minPullProgressReportInternal, it's easy to file false alert.

We should remove the minPullProgressReportInternal limit.

Fixes: #8024

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
Wei Fu
f6c4de6b53 fix: podsandbox depends on Lease plugin
introduced by 09723a6175

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-17 23:29:04 +08:00
Abel Feng
32bf805e57 sandbox: add a sandboxService interface to criService
so that we can add a fakeSandboxService to the criService in tests.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:25:58 +08:00
Abel Feng
25a4c3d235 sandbox: remove SandboxersServicePlugin
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:22:10 +08:00
Abel Feng
0cf48bab2c sandbox: podsandbox init its own client
To break the cyclic dependency of cri plugin and podsandbox plugin,
we define a new plugin type of SandboxesServicePlugin and when cri init
it's own client, it will add the all the controllers by get them from
the SandboxesServicePlugin.
when podsandbox controller init it's client, it will not Require the
SandboxesServicePlugin.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:22:10 +08:00
Akhil Mohan
e682da76ce
fix labels in pod sandbox
Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:52:09 +05:30
Akhil Mohan
64c41162c3
update tests to use labels from cri/labels
Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:46:43 +05:30
Akhil Mohan
7e79225cec
refactor labels used in cri server
remove the duplication of labels used in cri/server
and move them to a common package cri/labels

Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:45:26 +05:30
Samuel Karp
bd2db42464
Merge pull request #9287 from lengrongfu/feat/add-warning-use-inheritable
add warning use inheritable Capabilities
2023-11-04 00:33:18 +00:00
Samuel Karp
6cd0e8e405
Merge pull request #9321 from dmcgowan/switch-to-plugin-repo
Switch to plugin repo
2023-11-02 16:50:49 +00:00
Phil Estes
740717673f
Merge pull request #9317 from jsturtevant/fix-sbserver-windows
CRI: Handle ArgsEscaped for new Sb Server by clearing commandline in spec
2023-11-02 14:45:39 +00:00
rongfu.leng
df19888f83 add warning use inheritable Capabilities
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-02 16:14:59 +08:00
Derek McGowan
9db21401c4
Switch to github.com/containerd/plugin
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 23:01:42 -07:00
James Sturtevant
a67efe88db
Add tests cases
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2023-11-01 15:32:43 -07:00
James Sturtevant
0ffc3e9873
Handle ArgsEscaped for new Sb Server
The PR https://github.com/containerd/containerd/pull/8198 fixed this for CRI but missed clearing the commandline in the forked SB server. This simply adds that back in

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2023-11-01 12:06:07 -07:00
Derek McGowan
261e01c2ac
Move client to subpackage
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 10:37:00 -07:00
Derek McGowan
5fdf55e493
Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00
Derek McGowan
638b474c81
Temporarily remove imgcrypt in CRI to fix circular dependency
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-27 15:36:47 -07:00
Derek McGowan
18c9e7ec4c
Merge pull request #9270 from fuweid/fix-sb-issues
pkg/cri: should ignore no sandbox bucket
2023-10-21 21:44:16 +00:00
Abel Feng
8b4f9656d2 sandbox: remove ValidateMode as it is not used
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-20 16:02:13 +08:00
Wei Fu
337cc21719 pkg/cri: should ignore no sandbox bucket
The sandbox might be recovered from v1.x release. It doesn't have
metadata bucket. We should ignore the not-found error.

How to reproduce the issue:

```bash
➜  containerd git:(main) sudo ctr version
Client:
  Version:  1.6.22
  Revision: 8165feabfdfe38c65b599c4993d227328c231fca
  Go version: go1.19.11

Server:
  Version:  1.6.22
  Revision: 8165feabfdfe38c65b599c4993d227328c231fca
  UUID: be4216aa-8a2e-4305-9186-efeacd2d9a17

➜  containerd git:(main) cat /tmp/pod.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}

➜  containerd git:(main) sudo crictl runp /tmp/pod.json
616ea1cc657c57e80abf74e707a8177878ac2ec1ab7c346b4adb7bc0fadf986e
➜  containerd git:(main) sudo crictl pods
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
616ea1cc657c5       9 seconds ago       Ready               nginx-sandbox       default             1                   (default)

➜  containerd git:(main) make BUILDTAGS=no_btrfs
➜  containerd git:(main) sudo PREFIX=/usr make install
+ install bin/ctr bin/containerd bin/containerd-stress bin/containerd-shim-runc-v2

➜  containerd git:(main) sudo systemctl restart containerd
➜  containerd git:(main) sudo ctr version
Client:
  Version:  v1.7.0-943-g980767551
  Revision: 9807675518
  Go version: go1.20.10

Server:
  Version:  v1.7.0-943-g980767551
  Revision: 9807675518
  UUID: be4216aa-8a2e-4305-9186-efeacd2d9a17

➜  containerd git:(main) sudo crictl stopp  616ea1cc657c5
Stopped sandbox 616ea1cc657c5

➜  containerd git:(main) sudo crictl rmp 616ea1cc657c5
E1019 14:03:37.885162 2052643 remote_runtime.go:295] "RemovePodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to remove sandbox metadata from store: failed to delete sandbox \"616ea1cc657c57e80abf74e707a8177878ac2ec1ab7c346b4adb7bc0fadf986e\": bucket not found" podSandboxID="616ea1cc657c5"
removing the pod sandbox "616ea1cc657c5": rpc error: code = Unknown desc = failed to remove sandbox metadata from store: failed to delete sandbox "616ea1cc657c57e80abf74e707a8177878ac2ec1ab7c346b4adb7bc0fadf986e": bucket not found
```

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-10-20 15:20:18 +08:00
Maksym Pavlenko
f90f80d9b3
Merge pull request #9254 from adisky/cri-streaming-from-k8s
Use staging k8s.io/kubelet/cri/streaming package
2023-10-19 12:32:12 -07:00
Derek McGowan
bb64e6a8ef
Initialize sandbox controller list on CRI server creation
Avoid calling out to the client to get a sandbox controller and instead
setup the list of controllers on initialization. This fixes a test
failure which does not set the client.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-18 15:25:25 -07:00
Derek McGowan
9807675518
Merge pull request #8268 from abel-von/sandbox-plugin
Sandbox: make sandbox controller plugin
2023-10-18 10:16:10 -07:00
Aditi Sharma
03d81f595f Use cri streaming pkg from k8s staging
Use staging k8s.io/kubelet/cri/streaming package

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2023-10-18 09:14:28 +05:30
Fu Wei
dc7dba9c20
Merge pull request #9239 from jiangliu/cri-multi-snapshotters
CRI: use (snapshotter_id, snapshot_key) to uniquely identify snapshots
2023-10-18 09:30:55 +08:00
Jiang Liu
8e7c10c6d0 CRI: enhance ImageFsInfo() to support multiple snapshotters
Enhance cri/server/image/imagefs_info.go:ImageFsInfo() to support
snapshotter per runtime. Now `ImageFsInfoResponse.ImageFilesystems` may
contain multiple entries.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-10-17 17:38:18 +08:00
Abel Feng
3ef300ca75 sandbox: remove global variable of podsandbox controller
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 21:17:50 +08:00
Abel Feng
d2d434b7d6 sandbox: add all sandbox information to Create method
When call sandbox controller to create sandbox, we change the param from
sandbox id to total sandbox object to git all information to controller,
so that sandbox controller do not rely on the sandbox store anymore,
this is more decouple for the sandbox controller plugin inside
containerd, and it is neccesary for remote sandbox controller plugins as
it is not able to get sandbox from the sandbox store anymore.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 21:17:50 +08:00
Abel Feng
2951fb6dc6 sandbox: support more sandbox controllers
make containerd extensible to support more sandbox controllers
registered into containerd by config.
we change the default sandbox controller plugin's name from "local" to "shim".
to make sure we can get the controller by the plugin name it registered into
containerd.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 21:17:44 +08:00
Abel Feng
69e501e7cd sandbox: change SandboxMode to Sandboxer
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-10-16 20:49:36 +08:00
Jiang Liu
5ad6f34329 CRI: use (snapshotter_id, snapshot_key) to uniquely identify snapshots
Before snapshotter per runtime, CRI only supports a global snapshotter.
So a snapshot can be uniquely identified by `snapshot_key`. With snapshotter
per runtime enabled, there may be multiple snapshotters used by CRI. So only
(snapshotter_id, snapshot_key) can uniquely identify a snapshot.
Also extends CRI/store/snapshot/Store to support multiple snapshotters.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-10-16 10:21:10 +08:00
Derek McGowan
a80606bc2d
Move plugin type definitions to containerd plugins package
The plugins packages defines the plugins used by containerd.
Move all the types and properties to this package.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-12 20:52:56 -07:00
Maksym Pavlenko
1b31993240 Rename sbserver to server
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-12 15:46:57 -07:00
Maksym Pavlenko
536abf1b35 Remove legacy CRI
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-12 09:18:45 -07:00
Maksym Pavlenko
33b1a833c6 Move CRI testing package out
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-12 09:18:20 -07:00
Maksym Pavlenko
016114ff6f Move bandwidth package out of CRI server
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2023-10-12 09:12:52 -07:00
Samuel Karp
841c693710
Merge pull request #9076 from CFSworks/fix-ipv6-pref
Fix `ip_pref` configuration option
2023-10-06 23:31:45 -07:00
Fu Wei
bce8fe60df
Merge pull request #9183 from dcantah/cri-snapshotter-platform 2023-10-04 11:40:33 +08:00
Kohei Tokunaga
6169433b6b
Fix linter issues for golangci-lint 1.54.2
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
2023-10-03 21:56:47 +09:00