Commit Graph

10590 Commits

Author SHA1 Message Date
Derek McGowan
8d69c041c5
Update cgroups to v1.0.3
Pull in latest cgroups to pick up leak fixes

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-02-01 16:57:51 -08:00
Andrew G. Morgan
6906b57c72
Fix the Inheritable capability defaults.
The Linux kernel never sets the Inheritable capability flag to
anything other than empty. Non-empty values are always exclusively
set by userspace code.

[The kernel stopped defaulting this set of capability values to the
 full set in 2000 after a privilege escalation with Capabilities
 affecting Sendmail and others.]

Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
2022-02-01 13:55:46 -08:00
Sebastiaan van Stijn
bec6e4dd67
platforms.Normalize(): do not reset OSVersion and OSFeatures
Commit fb0688362c implemented the Normalize()
function, but marked these fields as deprecated.

It's unclear what the motivation was for this, as the fields are part of the OCI
Image spec. On Windows, the OSVersion field specifically is important when matching
images (as kernel versions may not be compatible).

This patch updates platforms.Normalize() to preserve the OSVersion and OSFeatures
fields.

As a follow-up, we should look at defining an appropriate string-representation
for these fields (possibly as part of the OCI Spec), and update platforms.Parse()
accordingly.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-01 17:19:28 +01:00
Akihiro Suda
34f7173491
seccomp: kernel 5.16 (futex_waitv)
Allow `futex_waitv` by default.
See https://www.phoronix.com/scan.php?page=news_item&px=FUTEX2-futex-waiv-More-Archs

Note: libseccomp does not cover kernel 5.16 at this moment:
51b50f95e1/src/syscalls.csv

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-02-01 09:08:06 +09:00
Akihiro Suda
8632bdcb7b
seccomp: kernel 5.15 (process_mrelease)
Allow `process_mrelease` by default.

See https://lwn.net/Articles/864184/

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-02-01 09:08:05 +09:00
Akihiro Suda
c013db6965
seccomp: kernel 5.14 (quotactl_fd, memfd_secret)
- Allow `quotactl_fd` when `CAP_SYS_ADMIN` is granted.
  See https://lwn.net/Articles/859679/

- Allow `memfd_secret` by default.
  See https://lwn.net/Articles/865256/

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-02-01 09:08:01 +09:00
Akihiro Suda
17a2831f70
seccomp: kernel 5.13 (landlock_{add_rule,create_ruleset,restrict_self})
Allow the following syscalls by default:
- `landlock_add_rule`
- `landlock_create_ruleset`
- `landlock_restrict_self`

See https://landlock.io/

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-02-01 09:07:33 +09:00
Akihiro Suda
1329ea3716
seccomp: kernel 5.12 (mount_setattr)
Allow `mount_setattr` when `CAP_SYS_ADMIN` is granted.

See https://man7.org/linux/man-pages/man2/mount_setattr.2.html

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-02-01 09:06:41 +09:00
Michael Crosby
52b8ca5545
Merge pull request #6411 from nmeum/swapcontext
seccomp: add support for "swapcontext" syscall in default policy
2022-01-31 16:11:55 -05:00
Sebastiaan van Stijn
fdbfde5d81
cmd/containerd-shim: add -v (version) flag
Unlike the other shims, containerd-shim did not have a -v (version) flag:

    ./bin/containerd-shim-runc-v1 -v
    ./bin/containerd-shim-runc-v1:
    Version: v1.6.0-rc.1
    Revision: ad771115b82a70cfd8018d72ae489c707e63de16.m
    Go version: go1.17.2

    ./bin/containerd-shim -v
    flag provided but not defined: -v
    Usage of ./bin/containerd-shim:

This patch adds a `-v` flag to be consistent with the other shims. The code was
slightly refactored to match the implementation in the other shims, taking the
same approach as 77d53d2d23/runtime/v2/shim/shim.go (L240-L256)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-01-31 21:09:50 +01:00
Sebastiaan van Stijn
e79aba10d4
integration/images/volume-ownership: strip path information from usage output
POSIX guidelines describes; https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html#g_t_002d_002dversion

> The program’s name should be a constant string; don’t compute it from argv[0].
> The idea is to state the standard or canonical name for the program, not its
> file name.

We don't have a const for this, but let's make a start and just remove the path info.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-01-31 21:07:00 +01:00
Sebastiaan van Stijn
b8cadf7539
runtime/v2/shim: strip path information from version output
I noticed that path information showed up in the version output:

    ./bin/containerd-shim-runc-v1 -v
    ./bin/containerd-shim-runc-v1:
    Version:  v1.6.0-rc.1
    Revision: ad771115b82a70cfd8018d72ae489c707e63de16.m
    Go version: go1.17.2

POSIX guidelines describes; https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html#g_t_002d_002dversion

> The program’s name should be a constant string; don’t compute it from argv[0].
> The idea is to state the standard or canonical name for the program, not its
> file name.

Unfortunately, this code is used by multiple binaries, so we can't fully remove
the use of os.Args[0], but let's make a start and just remove the path info.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-01-31 21:01:01 +01:00
Derek McGowan
c2cb589221
Merge pull request #6478 from fuweid/enhance-no-sync-during-create
oci: use readonly mount to read user/group info
2022-01-31 10:35:51 -08:00
Michael Crosby
e178d831ef
Merge pull request #6475 from estesp/import-correct-media-type
Fix possibly incorrect media type default on import
2022-01-31 11:47:24 -05:00
Michael Crosby
82af36e59b
Merge pull request #5828 from cpuguy83/shimv2_exit_on_signals
shimv2: handle sigint/sigterm
2022-01-31 10:47:39 -05:00
Kazuyoshi Kato
cc59ae4d98 tracing: return (ctx, span) from StartSpan
OpenTelemetry's Tracer#Start() returns (ctx, span). We have no reasons
to swap them.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-01-29 00:41:21 +00:00
Kazuyoshi Kato
e751f1f44f tracing: support OTLP/HTTP in addition to gRPC
This change adds OTLP/HTTP, specifically http/protobuf support.

http/protobuf is recommended in
https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/protocol/exporter.md.

However kube-apiserver and CRI-O use gRPC, kubelet may support
gRPC in future. So we should support gRPC as well.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-01-29 00:41:18 +00:00
Michael Crosby
9c676e98dd
Merge pull request #6481 from Junnplus/acr-400
Fix acr fetch token 400
2022-01-28 11:53:51 -05:00
Wei Fu
813a061fe1 oci: use readonly mount to read user/group info
In linux kernel, the umount writable-mountpoint will try to do sync-fs
to make sure that the dirty pages to the underlying filesystems. The many
number of umount actions in the same time maybe introduce performance
issue in IOPS limited disk.

When CRI-plugin creates container, it will temp-mount rootfs to read
that UID/GID info for entrypoint. Basically, the rootfs is writable
snapshotter and then after read, umount will invoke sync-fs action.

For example, using overlayfs on ext4 and use bcc-tools to monitor
ext4_sync_fs call.

```
// uname -a
Linux chaofan 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

// open terminal 1
kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod

// open terminal 2
/usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P

  ext4_sync_fs
  sync_filesystem
  ovl_sync_fs
  __sync_filesystem
  sync_filesystem
  generic_shutdown_super
  kill_anon_super
  deactivate_locked_super
  deactivate_super
  cleanup_mnt
  __cleanup_mnt
  task_work_run
  exit_to_user_mode_prepare
  syscall_exit_to_user_mode
  do_syscall_64
  entry_SYSCALL_64_after_hwframe
  syscall.Syscall.abi0
  github.com/containerd/containerd/mount.unmount
  github.com/containerd/containerd/mount.UnmountAll
  github.com/containerd/containerd/mount.WithTempMount.func2
  github.com/containerd/containerd/mount.WithTempMount
  github.com/containerd/containerd/oci.WithUserID.func1
  github.com/containerd/containerd/oci.WithUser.func1
  github.com/containerd/containerd/oci.ApplyOpts
  github.com/containerd/containerd.WithSpec.func1
  github.com/containerd/containerd.(*Client).NewContainer
  github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
  github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
  github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
  github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
  k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
  google.golang.org/grpc.(*Server).processUnaryRPC
  google.golang.org/grpc.(*Server).handleStream
  google.golang.org/grpc.(*Server).serveStreams.func1.2
  runtime.goexit.abi0
    containerd [34771]
    1
```

If there are comming several create requestes, umount actions might
bring high IO pressure on the /var/lib/containerd's underlying disk.

After checkout the kernel code[1], the kernel will not call
__sync_filesystem if the mount is readonly. Based on this, containerd
should use readonly mount to get UID/GID information.

Reference:

* [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61

Closes: #4604

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-01-28 23:36:04 +08:00
Phil Estes
a43703fcba
Merge pull request #6455 from tonistiigi/amd64-variants
platforms: add support for matching amd64 variants
2022-01-27 10:07:49 -05:00
ye.sijun
c0e00f19ab fix acr fetch token 400
Signed-off-by: ye.sijun <junnplus@gmail.com>
2022-01-27 17:34:45 +08:00
Derek McGowan
3f5d789dfb
Merge pull request #6476 from gabriel-samfira/various-periodic-fixes
Fix windows periodic workflow
2022-01-25 16:43:11 -08:00
Gabriel Adrian Samfira
4cd9f37f56
Fix windows periodic workflow
This change addresses the following issues:

  * Fix fetching the public IP of the windows instance.
  * Fix generation of repolist.toml.
  * Resource cleanup is now run even if tests fail.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-01-25 21:54:16 +02:00
Phil Estes
4aff7431fe
Fix possibly incorrect media type default on import
As reported, running import twice without using the compress import
option means that the content store will have existing layers during the
second import and the existing code hardcodes existing layer media type
to compressed. This fixes the issue by actually reading the header bytes
from the store and setting the media type appropriately.

Signed-off-by: Phil Estes <estesp@amazon.com>
2022-01-25 14:11:20 -05:00
Brian Goff
3ffb6a6113 shimv2: handle sigint/sigterm
This causes sigint/sigterm to trigger a shutdown of the shim.
It is needed because otherwise the v2 shim hangs system shutdown.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-01-25 17:57:28 +00:00
Fu Wei
2986d5b077
Merge pull request #6473 from kzys/gc-docs 2022-01-25 13:25:49 +08:00
Kazuyoshi Kato
f048a25938 docs: add doc-comments on GC-related methods
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-01-24 14:26:14 -08:00
Maksym Pavlenko
731518417e
Merge pull request #6465 from fuweid/fix-issue-6429
fix: should not send 137 code event if cmd is notfound
2022-01-21 14:50:46 -08:00
Wei Fu
31a710c492 fix: should not send 137 code event if cmd is notfound
ShimV2 has shim.Delete command to cleanup task's temporary resource,
like bundle folder. Since the shim server exits and no persistent store
is for task's exit code, the result of shim.Delete is always 137 exit
code, like the task has been killed.

And the result of shim.Delete can be used as task event only when the
shim server is killed somehow after container is running. Therefore,
dockerd, which watches task exit event to update status of container,
can report correct status.

Back to the issue #6429, the container is not running because the
entrypoint is not found. Based on this design, we should not send
137 exitcode event to subscriber.

This commit is aimed to remove shim instance first and then the
`cleanupAfterDeadShim` should not send event.

Similar Issue: #4769
Fix #6429

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-01-22 00:58:33 +08:00
Phil Estes
ab8d99cf4b
Merge pull request #6463 from Junnplus/empty-scope
Fix empty scopes return
2022-01-20 15:34:11 -05:00
Jeff Zvier
356ca75757 containerd-shim-runc-v2: return init pid when clean dead shim
If containerd-shim-runc-v2 process dead abnormally, such as received
kill 9 signal, panic or other unkown reasons, the containerd-shim-runc-v2
server can not reap runc container and forward init process exit event.
This will lead the container leaked in dockerd. When shim dead, containerd
will clean dead shim, here read init process pid and forward exit event
with pid at the same time.

Signed-off-by: Jeff Zvier <zvier20@gmail.com>
2022-01-20 17:06:55 +08:00
ye.sijun
936faf9c98 fix empty scopes return
Signed-off-by: ye.sijun <junnplus@gmail.com>
2022-01-20 15:16:44 +08:00
Derek McGowan
ad771115b8
Merge pull request #6462 from dmcgowan/prepare-1.6.0-rc.1
Prepare release notes for v1.6.0-rc.1
2022-01-19 19:13:47 -08:00
Derek McGowan
62f6c8175a
Merge pull request #6424 from cpuguy83/nondist-blob-push
Add support for skipping non-dist blob push
2022-01-19 19:12:31 -08:00
Derek McGowan
c1e17d8ba0
Prepare release notes for v1.6.0-rc.1
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-19 13:56:07 -08:00
Derek McGowan
60703af9fd
Merge pull request #6461 from dmcgowan/ci-compile-go-1.16
Compile binaries for go1.16 and go1.17 in CI
2022-01-19 13:46:39 -08:00
Michael Crosby
a372097669
Merge pull request #6432 from dmcgowan/fix-introspection-service
services/introspection: fix plugin caching to show grpc plugins
2022-01-19 16:44:27 -05:00
Michael Crosby
f2c2d07683
Merge pull request #6458 from dcantah/change-to-constant-win2022
Integration: Change to Windows Server 2022 build number constant
2022-01-19 16:19:00 -05:00
Derek McGowan
4f552b077e
Compile binaries for go1.16 and go1.17 in CI
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-19 12:22:01 -08:00
Phil Estes
762b018969
Merge pull request #6460 from dmcgowan/update-kubernetes-vendors-0.22
Update kubernetes vendor to 0.22.5
2022-01-19 15:19:39 -05:00
Daniel Canter
7d7064e6b4 Integration: Change to Windows Server 2022 build number constant
The build number used to determine whether we need to pull the Windows
Server 2022 image for the integration tests was previously hardcoded as there
wasn't an hcsshim release with the build number. Now that there is and it's
vendored in, this change just swaps to it.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-01-19 11:54:22 -08:00
Michael Crosby
13b804c10d
Merge pull request #6459 from dmcgowan/fix-rdt-build-tags
Fix rdt build tags for go 1.16
2022-01-19 14:51:14 -05:00
Derek McGowan
2898004a5b
Update kubernetes vendor to 0.22.5
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-19 11:24:12 -08:00
Derek McGowan
5089b12100
Merge pull request #6439 from dmcgowan/remove-submodule-go-mod
Remove submodule go mod
2022-01-19 11:20:04 -08:00
Derek McGowan
4e9e14c2b6
Fix rdt build tags for go 1.16
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-19 11:09:29 -08:00
Phil Estes
778da8bae9
Merge pull request #6453 from dcantah/bump-hcsshim-0.9.2
go.mod: Update hcsshim to v0.9.2
2022-01-19 08:34:46 -05:00
Akihiro Suda
f0afdea2ad
Merge pull request #6375 from AkihiroSuda/runc-1.1.0
update runc to v1.1.0
2022-01-19 15:19:59 +09:00
Tonis Tiigi
af83e9af10 platforms: add support for matching amd64 variants
Correctly matches optional variants for amd64
arch. These should be used for standardized values
v1-v4 from https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels.

V1 remains the default and is cleared by default.
Pulling a higher variant will match the highest
available platform lower or equal to the provided one
when platformVector is used.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2022-01-18 20:13:37 -08:00
Daniel Canter
af39d2ad71 go.mod: Update hcsshim to v0.9.2
This tag brings in some bug fixes related to waiting for containers to terminate and
trying to kill an already terminated process, as well as tty support (exec -it) for
Windows Host Process Containers.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-01-18 17:34:52 -08:00
Derek McGowan
fcb7bd6997
Remove api go submodule
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-18 14:48:33 -08:00