Commit Graph

1709 Commits

Author SHA1 Message Date
Fu Wei
9f8b845334 Merge pull request #9463 from abel-von/sandbox-plugin-1205
sandbox: remove sandboxStore from podsandbox controller
2024-01-09 01:40:10 +00:00
Abel Feng
7dadd5f1bf sandbox: remove sandboxStore from podsandbox
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2024-01-09 00:19:17 +08:00
Paweł Gronowski
8235665e7e content: Add InfoReaderProvider
The interface that combines both content.InfoProvider and
content.Provider was duplicated in multiple places - create one directly
in `content` package and use it instead.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2024-01-05 16:37:26 +01:00
Derek McGowan
ced9b18c23 Merge pull request #9565 from koct9i/fix-return-grpc-code-unavailable-if-not-initialized
fix: return gRPC code "unavailable" if server is not initialized yet
2023-12-28 04:30:43 +00:00
Amit Barve
daa1ea522b Add cimfs differ and snapshotter
Details about CimFs project are discussed in #8346

Signed-off-by: Amit Barve <ambarve@microsoft.com>
2023-12-20 09:29:08 -08:00
Konstantin Khlebnikov
c191935747 fix: return gRPC code "unavailable" if server is not initialized yet
Correct error code helps client to decide when it should retry request.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
2023-12-20 11:07:18 +01:00
Derek McGowan
c0e94bc9a8 Merge pull request #9492 from kiashok/moveCriConfig
Move GenerateRuntimeOptions() to pkg/cri/config
2023-12-13 00:13:46 +00:00
Wei Fu
23278c81fb *: introduce image_pull_with_sync_fs in CRI
It's to ensure the data integrity during unexpected power failure.

Background:

Since release 1.3, in Linux system, containerD unpacks and writes files into
overlayfs snapshot directly. It doesn’t involve any mount-umount operations
so that the performance of pulling image has been improved.

As we know, the umount syscall for overlayfs will force kernel to flush
all the dirty pages into disk. Without umount syscall, the files’ data relies
on kernel’s writeback threads or filesystem's commit setting (for
instance, ext4 filesystem).

The files in committed snapshot can be loss after unexpected power failure.
However, the snapshot has been committed and the metadata also has been
fsynced. There is data inconsistency between snapshot metadata and files
in that snapshot.

We, containerd, received several issues about data loss after unexpected
power failure.

* https://github.com/containerd/containerd/issues/5854
* https://github.com/containerd/containerd/issues/3369#issuecomment-1787334907

Solution:

* Option 1: SyncFs after unpack

Linux platform provides [syncfs][syncfs] syscall to synchronize just the
filesystem containing a given file.

* Option 2: Fsync directories recursively and fsync on regular file

The fsync doesn't support symlink/block device/char device files. We
need to use fsync the parent directory to ensure that entry is
persisted.

However, based on [xfstest-dev][xfstest-dev], there is no case to ensure
fsync-on-parent can persist the special file's metadata, for example,
uid/gid, access mode.

Checkout [generic/690][generic/690]: Syncing parent dir can persist
symlink. But for f2fs, it needs special mount option. And it doesn't say
that uid/gid can be persisted. All the details are behind the
implemetation.

> NOTE: All the related test cases has `_flakey_drop_and_remount` in
[xfstest-dev].

Based on discussion about [Documenting the crash-recovery guarantees of Linux file systems][kernel-crash-recovery-data-integrity],
we can't rely on Fsync-on-parent.

* Option 1 is winner

This patch is using option 1.

There is test result based on [test-tool][test-tool].
All the networking traffic created by pull is local.

  * Image: docker.io/library/golang:1.19.4 (992 MiB)
    * Current: 5.446738579s
      * WIOS=21081, WBytes=1329741824, RIOS=79, RBytes=1197056
    * Option 1: 6.239686088s
      * WIOS=34804, WBytes=1454845952, RIOS=79, RBytes=1197056
    * Option 2: 1m30.510934813s
      * WIOS=42143, WBytes=1471397888, RIOS=82, RBytes=1209344

  * Image: docker.io/tensorflow/tensorflow:latest (1.78 GiB, ~32590 Inodes)
    * Current: 8.852718042s
      * WIOS=39417, WBytes=2412818432, RIOS=2673, RBytes=335987712
    * Option 1: 9.683387174s
      * WIOS=42767, WBytes=2431750144, RIOS=89, RBytes=1238016
    * Option 2: 1m54.302103719s
      * WIOS=54403, WBytes=2460528640, RIOS=1709, RBytes=208237568

The Option 1 will increase `wios`. So, the `image_pull_with_sync_fs` is
option in CRI plugin.

[syncfs]: <https://man7.org/linux/man-pages/man2/syncfs.2.html>
[xfstest-dev]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>
[generic/690]: <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/690?h=v2023.11.19>
[kernel-crash-recovery-data-integrity]: <https://lore.kernel.org/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/>
[test-tool]: <a17fb2010d/contrib/syncfs/containerd/main_test.go (L51)>

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-12-12 10:18:39 +08:00
Kirtana Ashok
25b052cbca Move GenerateRuntimeOptions() to pkg/cri/config
Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
2023-12-07 12:23:00 -08:00
Maksym Pavlenko
47163c3c00 Merge pull request #9391 from abel-von/sandbox-plugin-1117
sandbox: Move CRI Image Service and CRI Base seperate plugins
2023-12-01 22:42:22 +00:00
Maksym Pavlenko
63609d33ca Merge pull request #9434 from abel-von/add-integration-test
sandbox: add cri integration test case for upgrade
2023-12-01 22:40:35 +00:00
Maksym Pavlenko
e2303c267e Merge pull request #9414 from ZhangShuaiyi/fix/tomlext_MarshalText
tomlext.Duration add MarshalText method
2023-12-01 22:36:53 +00:00
Abel Feng
c0363754fb sandbox: get runtime info from sandbox or container
For backward compatibility, we should get runtimeInfo from sandbox in
db, or get it from the sandbox container in db.
Note that this is a temporary solution and we will remove the Container field in
Sandbox in cri cache, and replace it with a SandboxInsantance of type
containerd.Sandbox interface.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-12-01 15:58:27 +08:00
Abel Feng
e1b4958663 sandbox: keep rootDir and stateDir compatible
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:53 +08:00
Abel Feng
b0fef6738f sandbox: migrate sandbox_mode to sandboxer
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:53 +08:00
Abel Feng
c8012b6d74 sandbox: make a clear dependency of cri plugins
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-30 23:06:41 +08:00
Akihiro Suda
8e567aa581 mv pkg/process cmd/containerd-shim-runc-v2/process
The package is quite specific to runc and only imported by
containerd-shim-runc-v2

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-11-30 21:50:04 +09:00
Anthony Nandaa
8e91edb71e fix(pkg/dialer): minor fix on dialer function for windows
This commit fixes the dialer function to make sure that
"npipe://" prefix is trimmed, just like the way it is done
in the Unix counterpart, `./dialer_unix.go:50`

This will also unblock some downstream work going on in
buildkit; setting up integration tests to run on Windows.

Signed-off-by: Anthony Nandaa <profnandaa@gmail.com>
2023-11-22 04:25:11 -08:00
Shuaiyi Zhang
ad3f8c563b tomlext.Duration add MarshalText method
Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-11-22 19:28:46 +08:00
Maksym Pavlenko
e15c246550 Move CRI image service into a separate plugin
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-20 09:41:27 +08:00
Wei Fu
80dd779deb remotes/docker: close connection if no more data
Close connection if no more data. It's to fix false alert filed by image
pull progress.

```
dst = OpenWriter (--> Content Store)

src = Fetch
        Open (--> Registry)
        Mark it as active request

Copy(dst, src) (--> Keep updating total received bytes)

   ^
   |  (Active Request > 0, but total received bytes won't be updated)
   v

defer src.Close()
content.Commit(dst)
```

Before migrating to transfer service, CRI plugin doesn't limit global
concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent
goroutines to download blob and 1 goroutine to unpack blob. Like ext4
filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages
into disk. The host is running under IO pressure, and then the content.Commit
will take long time and block other goroutines. If httpreadseeker
doesn't close the connection after io.EOF, this connection will be
considered as active. The pull progress reporter reports there is no
bytes transfered and cancels the ImagePull.

The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin
can't limit the total concurrent downloads, this patch is to update 1-minute
to 5-minutes to prevent from unexpected cancel.

[1]: https://lwn.net/Articles/842385/
[2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
Wei Fu
2e9686c054 fix: deflake TestCRIImagePullTimeout/HoldingContentOpenWriter
The new active request is filed and there is no bytes read yet when the
progress reporter just wakes up. If the timeout / 2 is less than the
minPullProgressReportInternal, it's easy to file false alert.

We should remove the minPullProgressReportInternal limit.

Fixes: #8024

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-18 10:23:05 +08:00
Wei Fu
f6c4de6b53 fix: podsandbox depends on Lease plugin
introduced by 09723a6175

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-11-17 23:29:04 +08:00
Fu Wei
09723a6175 Merge pull request #9275 from abel-von/sandbox-plugin-1019
sandbox: podsandbox controller init its own client
2023-11-16 10:01:02 +00:00
Samuel Karp
1a54a217ca Merge pull request #9338 from Iceber/update_pinned_label
cri: fix update of pinned label for images
2023-11-16 05:54:55 +00:00
Austin Vazquez
01c442147f Enhance container image unpack client logs
Adds debug message per layer unpacking and adds duration field to
the existing image unpacking debug message.

Signed-off-by: Austin Vazquez <macedonv@amazon.com>
2023-11-15 17:30:53 +00:00
Abel Feng
32bf805e57 sandbox: add a sandboxService interface to criService
so that we can add a fakeSandboxService to the criService in tests.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:25:58 +08:00
Abel Feng
25a4c3d235 sandbox: remove SandboxersServicePlugin
Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:22:10 +08:00
Abel Feng
0cf48bab2c sandbox: podsandbox init its own client
To break the cyclic dependency of cri plugin and podsandbox plugin,
we define a new plugin type of SandboxesServicePlugin and when cri init
it's own client, it will add the all the controllers by get them from
the SandboxesServicePlugin.
when podsandbox controller init it's client, it will not Require the
SandboxesServicePlugin.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
2023-11-15 09:22:10 +08:00
Akhil Mohan
e682da76ce fix labels in pod sandbox
Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:52:09 +05:30
Akhil Mohan
64c41162c3 update tests to use labels from cri/labels
Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:46:43 +05:30
Akhil Mohan
7e79225cec refactor labels used in cri server
remove the duplication of labels used in cri/server
and move them to a common package cri/labels

Signed-off-by: Akhil Mohan <makhil@vmware.com>
2023-11-14 01:45:26 +05:30
Iceber Gu
2e014fa2ac cri: fix update of pinned label for images
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2023-11-10 23:27:11 +08:00
rongfu.leng
7b9fcfd7c6 add default enable unprivileged icmp/ports
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-08 23:00:35 +08:00
rongfu.leng
e099717f9f validate kernel version for unprivileged icmp/port
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-06 23:50:12 +08:00
Samuel Karp
bd2db42464 Merge pull request #9287 from lengrongfu/feat/add-warning-use-inheritable
add warning use inheritable Capabilities
2023-11-04 00:33:18 +00:00
Akihiro Suda
33fab02dce Merge pull request #7647 from thaJeztah/no_execabs
switch back from golang.org/x/sys/execabs to os/exec (go1.19)
2023-11-03 07:40:22 +00:00
Sebastiaan van Stijn
2af6db672e switch back from golang.org/x/sys/execabs to os/exec (go1.19)
This is effectively a revert of 2ac9968401, which
switched from os/exec to the golang.org/x/sys/execabs package to mitigate
security issues (mainly on Windows) with lookups resolving to binaries in the
current directory.

from the go1.19 release notes https://go.dev/doc/go1.19#os-exec-path

> ## PATH lookups
>
> Command and LookPath no longer allow results from a PATH search to be found
> relative to the current directory. This removes a common source of security
> problems but may also break existing programs that depend on using, say,
> exec.Command("prog") to run a binary named prog (or, on Windows, prog.exe) in
> the current directory. See the os/exec package documentation for information
> about how best to update such programs.
>
> On Windows, Command and LookPath now respect the NoDefaultCurrentDirectoryInExePath
> environment variable, making it possible to disable the default implicit search
> of “.” in PATH lookups on Windows systems.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-11-02 21:15:40 +01:00
Samuel Karp
a596d09ec9 cri: add deprecation warning for configs
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:32 -07:00
Samuel Karp
35924bccc0 cri: add deprecation warning for auths
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:32 -07:00
Samuel Karp
d7cb25d770 cri: add deprecation warning for mirrors
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:31 -07:00
Samuel Karp
58cc275eb8 cri: add ability to emit deprecation warnings
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2023-11-02 11:17:31 -07:00
Samuel Karp
6cd0e8e405 Merge pull request #9321 from dmcgowan/switch-to-plugin-repo
Switch to plugin repo
2023-11-02 16:50:49 +00:00
Phil Estes
740717673f Merge pull request #9317 from jsturtevant/fix-sbserver-windows
CRI: Handle ArgsEscaped for new Sb Server by clearing commandline in spec
2023-11-02 14:45:39 +00:00
rongfu.leng
df19888f83 add warning use inheritable Capabilities
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2023-11-02 16:14:59 +08:00
Derek McGowan
9db21401c4 Switch to github.com/containerd/plugin
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 23:01:42 -07:00
James Sturtevant
a67efe88db Add tests cases
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2023-11-01 15:32:43 -07:00
James Sturtevant
0ffc3e9873 Handle ArgsEscaped for new Sb Server
The PR https://github.com/containerd/containerd/pull/8198 fixed this for CRI but missed clearing the commandline in the forked SB server. This simply adds that back in

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
2023-11-01 12:06:07 -07:00
Derek McGowan
261e01c2ac Move client to subpackage
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-11-01 10:37:00 -07:00
Derek McGowan
5fdf55e493 Update go module to github.com/containerd/containerd/v2
Signed-off-by: Derek McGowan <derek@mcg.dev>
2023-10-29 20:52:21 -07:00