Commit Graph

1646 Commits

Author SHA1 Message Date
Kazuyoshi Kato
52a7480399 Remove github.com/gogo/protobuf again
While we need to support CRI v1alpha2, the implementation doesn't have
to be tied to gogo/protobuf.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-12-15 22:54:15 +00:00
Derek McGowan
a4bc380b91
Merge pull request #7814 from dcantah/hostnet-helper
CRI: Add host networking helper
2022-12-15 11:21:45 -08:00
Fu Wei
12f30e6524
Merge pull request #7792 from mxpv/sb-shutdown 2022-12-15 13:37:35 +08:00
Maksym Pavlenko
a4d5c3e5cb Support sandboxed shims shutdown
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-14 18:22:52 -08:00
Phil Estes
9b39b0bfd9
Merge pull request #7812 from mxpv/cri
Minor fix when querying pod sandbox status
2022-12-14 10:15:03 -05:00
Danny Canter
84529072d2 CRI: Add host networking helper
We do a ton of host networking checks around the CRI plugin, all mainly
doing the same thing of checking the different quirks on various platforms
(for windows are we a HostProcess pod, for linux is namespace mode the
right thing, darwin doesn't have CNI support etc.) which could all be
bundled up into a small helper that can be re-used.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2022-12-14 01:47:22 -08:00
Maksym Pavlenko
371e27ffb2
Merge pull request #7809 from mikebrow/check-deep-copies-on-restart
nil check to avoid panic on upgrade
2022-12-13 22:22:20 -08:00
Maksym Pavlenko
0e33a8fa4f [sb] Fix status
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-13 16:35:15 -08:00
Derek McGowan
c666147592
Merge pull request #7805 from chaunceyjiang/painc
fatal error: concurrent map iteration and map write
2022-12-13 15:01:25 -08:00
Mike Brown
ce3a732709 nil check to avoid panic on upgrade
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2022-12-13 16:18:18 -06:00
Phil Estes
ecf00ffe84
Merge pull request #7783 from inspektor-gadget/qasim/cri-disable-swap
cri: make swapping disabled with memory limit
2022-12-13 15:21:51 -05:00
chaunceyjiang
5a3a9baec9 fatal error: concurrent map iteration and map write
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2022-12-13 20:08:23 +08:00
Fu Wei
d2f68bfb36
Merge pull request #7313 from pacoxu/image-pull-metrics
add metrics for image pulling: error; in progress count; thoughput
2022-12-13 19:49:22 +08:00
Akihiro Suda
dc48349248
epoch: propagate SOURCE_DATE_EPOCH via ctx
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-12-12 09:02:35 +09:00
Phil Estes
a7428f4473
Merge pull request #7732 from AkihiroSuda/sha256-simd
digest: use github.com/minio/sha256-simd
2022-12-09 09:37:37 -05:00
Fu Wei
f2cf411b79
Merge pull request #7073 from ruiwen-zhao/event
Add container event support to containerd
2022-12-09 15:24:23 +08:00
Maksym Pavlenko
e1abaeb386
Merge pull request #7764 from mxpv/config
Pass TOML configuration options for runtimes CRI is not aware of
2022-12-08 12:59:13 -08:00
ruiwen-zhao
a6929f9f6b Add Evented PLEG support to sandbox server
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-12-08 19:31:36 +00:00
ruiwen-zhao
a338abc902 Add container event support to containerd
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-12-08 19:30:39 +00:00
Maksym Pavlenko
3bc8fc4d30 Cleanup build constraints
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-08 09:36:20 -08:00
Qasim Sarfraz
69975b92bb cri: make swapping disabled with memory limit
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.

Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
2022-12-08 13:54:55 +01:00
Akihiro Suda
cde9490779
digest: use github.com/minio/sha256-simd
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-12-08 18:50:00 +09:00
Maksym Pavlenko
d10dbd2d2d
Merge pull request #7773 from mxpv/ctx
Fix context when waiting sandbox
2022-12-07 13:53:37 -08:00
Derek McGowan
241563be06
Merge pull request from GHSA-2qjp-425j-52j9
CRI stream server: Fix goroutine leak in Exec
2022-12-07 13:50:26 -08:00
Maksym Pavlenko
f9295aa49f Fix context when waiting sandbox
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-07 12:52:04 -08:00
Maksym Pavlenko
8ab1d44967 Pass runtime configuration as TOML blob
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-07 10:55:06 -08:00
Maksym Pavlenko
3e92dedc2e Update runtime options to include bytes blob
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-07 10:55:06 -08:00
Paco Xu
c59f1635f0 add metrics for image pulling: success/failure count; in progress count; thoughput
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-12-07 15:11:00 +08:00
Maksym Pavlenko
03a4dc0711
Merge pull request #7768 from mxpv/fixes
sbserver bug fixing
2022-12-06 17:07:54 -08:00
Maksym Pavlenko
a113737ccf sbserver bug fixing
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-12-06 15:32:42 -08:00
Phil Estes
5d4276cc34
Merge pull request #7525 from thaJeztah/remove_deprecated_stubs
remove some (aliases for) deprecated functions
2022-12-06 11:49:18 -05:00
Derek McGowan
8a25fa584f
Unwrap proto errors in streaming client
Allows clients to properly detect context cancellation

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-12-02 22:30:54 -08:00
Derek McGowan
51195ad099
Merge pull request #7731 from mxpv/cri
[Sandbox API] CRI status cleanup
2022-12-01 13:43:13 -08:00
Derek McGowan
f88162587b
Rename transferer to transferrer
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 16:02:09 -08:00
Derek McGowan
fc2754204f
Cleanup code comments and lint fixes
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 16:02:09 -08:00
Derek McGowan
c387a52051
Add variables names to transfer interface
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:57 -08:00
Derek McGowan
8304a61b53
Combine stream fuzz tests
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:57 -08:00
Derek McGowan
0762a3a759
Add media type to export stream
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:57 -08:00
Derek McGowan
11c1c8e6f4
Update import logic
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:56 -08:00
Derek McGowan
40d3fa3afd
Add filter fields to image store types
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:56 -08:00
Derek McGowan
737257bb48
Add push progress
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:55 -08:00
Derek McGowan
e88baa0873
Fixup pull authorization and labeling
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:55 -08:00
Derek McGowan
478f1c934d
Lint fixes
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:55 -08:00
Derek McGowan
6b5df1ee16
Update transfer packages
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:54 -08:00
Derek McGowan
7318a2def6
Add transfer plugin registration
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:53 -08:00
Derek McGowan
d1627e3c71
Add basic import and export handlers
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:53 -08:00
Derek McGowan
adfaeeff0d
Add binary stream functionality and helpers
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:53 -08:00
Derek McGowan
81afd9c36e
Add progress
Signed-off-by: Derek McGowan <derek@mcg.dev>

Update progress to reference parents

Signed-off-by: Derek McGowan <derek@mcg.dev>

Update Progress logic

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:52 -08:00
Derek McGowan
0e4e96544f
Add transfer proxy client
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:52 -08:00
Derek McGowan
6f64cb8598
Transfer interface and plugin work in progress
Signed-off-by: Derek McGowan <derek@mcg.dev>

Transfer service implementation

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:56:50 -08:00
Derek McGowan
dcf5687cab
Add streaming service
Adds a service capable of streaming Any objects bi-directionally.
This can be used by services to send data, received data, or to
initiate requests from server to client.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-11-30 12:55:56 -08:00
Derek McGowan
c469f67a2b
Merge pull request #6019 from klihub/pr/proto/nri
NRI: add support for NRI with extended scope.
2022-11-30 10:42:17 -08:00
Kirtana Ashok
08d5879f32 Added nullptr checks to pkg/cri/server and sbserver
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
2022-11-29 13:25:49 -08:00
Danny Canter
f012617edf CRI stream server: Fix goroutine leak in Exec
In the CRI streaming server, a goroutine (`handleResizeEvents`) is launched
to handle terminal resize events if a TTY is asked for with an exec; this
is the sender of terminal resize events. Another goroutine is launched
shortly after successful process startup to actually do something with
these events, however the issue arises if the exec process fails to start
for any reason that would have `process.Start` return non-nil. The receiver
goroutine never gets launched so the sender is stuck blocked on a channel send
infinitely.

This could be used in a malicious manner by repeatedly launching execs
with a command that doesn't exist in the image, as a single goroutine
will get leaked on every invocation which will slowly grow containerd's
memory usage.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2022-11-29 06:31:38 -08:00
Maksym Pavlenko
9f4ba48839 [sandbox] Fix panic when waiting for sandbox controller
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-28 13:29:21 -08:00
Maksym Pavlenko
dbc6d33ac5 [sandbox] Specify sandbox ID when using sandboxed shims
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-28 13:27:54 -08:00
Maksym Pavlenko
9a53a6c34a [sandbox] Don't access pause container when creating pod container
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-28 12:48:57 -08:00
Maksym Pavlenko
cc111eef61 [sandbox] Move sandbox info to podsandbox controller
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-28 12:06:41 -08:00
Krisztian Litkey
02f0a8b50e pkg/cri/server: nuke old v0.1.0 NRI hooks.
Remove direct invocation of old v0.1.0 NRI plugins. They
can be enabled using the revised NRI API and the v0.1.0
adapter plugin.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2022-11-28 21:51:42 +02:00
Krisztian Litkey
b27ef6f169 pkg/cri/server: experimental NRI integration for CRI.
Implement the adaptation interface required by the NRI
service plugin to handle CRI sandboxes and containers.
Hook the NRI service plugin into CRI request processing.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2022-11-28 21:51:08 +02:00
Krisztian Litkey
43704ca888 nri: add experimental NRI plugin.
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.

The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2022-11-28 21:51:06 +02:00
Maksym Pavlenko
a6d1d53cc2 [sandbox] Update Controller.Status protos
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-28 10:13:37 -08:00
Fu Wei
6bfe6e38b2
Merge pull request #7684 from mxpv/sb-runtime-fix
Fix sandbox API when calling sandboxed shims
2022-11-28 22:32:08 +08:00
Maksym Pavlenko
6d830d30ad
Merge pull request #7470 from lengrongfu/feat/sandbox_api_status
Sandbox API: implement Controller.Status for SandboxAPI
2022-11-22 18:11:57 -08:00
Maksym Pavlenko
ae0da7dc58 Use sandbox store to retrieve runtime info for sandboxed containers
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-22 17:43:40 -08:00
Maksym Pavlenko
3ddaa34445 Retrieve sandbox creation time from store.
All pause container object references must be removed
from sbserver. This is an implementation detail of
podsandbox package.

Added TODOs for remaining work.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-22 17:43:36 -08:00
Maksym Pavlenko
4b32819823 Remove duplicated helpers
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-22 17:40:36 -08:00
Maksym Pavlenko
3f331e7d13 Specify runtime configuration for sandbox shims
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-22 17:39:27 -08:00
Samuel Karp
a74f7e902b
sbserver: save netns in sandbox metadata on create
Port of b41d6f40bb to sbserver

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2022-11-21 16:45:37 -08:00
Samuel Karp
1deaedd38a
sbserver: persist sandbox during partial teardown
Port of 4f4aad057d to sbserver

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2022-11-21 16:45:36 -08:00
Phil Estes
99acefaad9
Merge pull request #7697 from inspektor-gadget/qasim/add-sandbox-uid-annotation
cri: add pod uid annotation
2022-11-21 10:54:20 -05:00
Sebastiaan van Stijn
3e5b444ac4
pkg/cri/util/: remove deprecated NormalizeImageRef alias
Has been deprecated in containerd v1.3.0, so we can remove this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-20 22:27:20 +01:00
yanggang
579c7f43de
Change fsnotify event status condition.
Signed-off-by: yanggang <gang.yang@daocloud.io>
2022-11-20 09:43:54 +08:00
Fu Wei
8e787543de
Merge pull request #7685 from sofat1989/mainrunserially
can set up the network serially by CNI plugins
2022-11-19 12:33:40 +08:00
Qasim Sarfraz
0c4d32c131 cri: add pod uid annotation
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
2022-11-19 01:12:02 +01:00
ruiwen-zhao
792294ce06 Update to cri-api v0.26.0-beta.0
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-11-18 21:13:34 +00:00
ruiwen-zhao
234bf990dc Copy cri-api v1alpha2 from v0.25.4 to containerd internal directory
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-11-18 21:09:43 +00:00
Fei Su
f6232793b4 can set up the network serially by CNI plugins
Signed-off-by: Fei Su <sofat1989@126.com>
2022-11-18 15:19:00 +08:00
Derek McGowan
223f67ccdb
Merge pull request #7601 from kzys/cgroups-upgrade
Upgrade github.com/containerd/cgroups from v1 to v3
2022-11-17 21:55:03 -08:00
Maksym Pavlenko
aaf59efd20 Expose Done and Err in Shutdown service
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-11-16 22:03:44 -08:00
Kazuyoshi Kato
6596a70861 Use github.com/containerd/cgroups/v3 to remove gogo
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-11-14 21:07:48 +00:00
bin liu
0c63c42f81 Fix slice append error
In golang when copy a slice, if the slice is initialized with a
desired length, then appending to it will cause the size double.

Signed-off-by: bin liu <liubin0329@gmail.com>
2022-11-12 00:22:04 +08:00
Fu Wei
669230cbd6
Merge pull request #7655 from swagatbora90/tracing-refactor
Add a thin wrapper around otel Span object
2022-11-11 11:45:28 +08:00
Swagat Bora
7def13dde3 Add a thin wrapper around otel Span object
Signed-off-by: Swagat Bora <sbora@amazon.com>
2022-11-11 01:28:27 +00:00
Kazuyoshi Kato
02484f5e05
Merge pull request #7631 from thaJeztah/strings_cut
replace strings.Split(N) for strings.Cut() or alternatives
2022-11-10 15:28:22 -08:00
rongfu.leng
0f54c47401 feat add sandbox api status func
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2022-11-09 14:36:46 +08:00
Zhang Tianyang
c953eecb79 Sandbox API: Add a new mode config for sandbox controller impls
Add a new config as sandbox controller mod, which can be either
"podsandbox" or "shim". If empty, set it to default "podsandbox"
when CRI plugin inits.

Signed-off-by: Zhang Tianyang <burning9699@gmail.com>
2022-11-09 12:12:39 +08:00
Sebastiaan van Stijn
eaedadbed0
replace strings.Split(N) for strings.Cut() or alternatives
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:

```go
func BenchmarkSplit(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _ = strings.SplitN(s, "=", 2)[0]
                }
        }
}

func BenchmarkCut(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _, _, _ = strings.Cut(s, "=")
                }
        }
}
```

    BenchmarkSplit
    BenchmarkSplit-10            8244206               128.0 ns/op           128 B/op          4 allocs/op
    BenchmarkCut
    BenchmarkCut-10             54411998                21.80 ns/op            0 B/op          0 allocs/op

While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-07 10:02:25 +01:00
Swagat Bora
ee64926a72 add SpanAttribute
Signed-off-by: Swagat Bora <sbora@amazon.com>
2022-11-03 18:34:06 +00:00
Swagat Bora
3b87d46ce2 Add tracing spans in CRI image service and pull.go
Signed-off-by: Swagat Bora <sbora@amazon.com>

Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
2022-11-03 17:03:43 +00:00
Phil Estes
fc89d49531
Merge pull request #7576 from containerd/sb
Cleanup sandbox interfaces
2022-10-25 14:57:23 -04:00
Maksym Pavlenko
b7d0d12715 Cleanup sandbox interfaces
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-10-25 12:31:32 -04:00
Maksym Pavlenko
908be16858
Merge pull request #7577 from dcantah/maintenance-cri-winns 2022-10-23 14:32:02 -07:00
Danny Canter
9a0331c477 maintenance: Remove WithWindowsNetworkNamespace from pkg/cri
Old TODO stating that pkg/cri/opts's `WithWindowsNetworkNamespace`
should be moved to the main containerd pkg was out of date as thats
already been done (well, to the /oci package). This just removes it
and swaps all uses of `WithWindowsNetworkNamespace` to the oci
packages impl.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2022-10-23 06:45:32 -07:00
Fu Wei
52025b5d67
Merge pull request #7457 from wllenyj/sandbox_delete
CRI: implement Controller.Delete for SandboxAPI
2022-10-23 12:24:48 +08:00
Fu Wei
9b54eee718
Merge pull request #7419 from bart0sh/PR005-configure-CDI-registry-on-start 2022-10-22 08:17:33 +08:00
Sophie Liu
3e4449862b Add logging volume metrics to Containerd CRI plugin
Signed-off-by: Sophie Liu <sophieliu@google.com>
2022-10-19 10:47:49 -04:00
Mike Brown
3ce301ddee
Merge pull request #7349 from thaJeztah/gofmt_119
clean-up "nolint" comments, remove unused ones, update golangci-lint
2022-10-17 10:50:24 -05:00
Samuel Karp
890398677e
cri: PodSandboxStatus should tolerate missing task
Signed-off-by: Samuel Karp <samuelkarp@google.com>
2022-10-14 14:40:13 -07:00
Sebastiaan van Stijn
8b5df7d347
update golangci-lint to v1.49.0
Also remove "nolint" comments for deadcode, which is deprecated, and removed
from the defaults.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-12 14:41:01 +02:00
Sebastiaan van Stijn
f9c80be1bb
remove unneeded nolint-comments (nolintlint), disable deprecated linters
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:

    WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter.  Replaced by unused.
    WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter.  Replaced by unused.
    WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-12 14:41:01 +02:00
Sebastiaan van Stijn
29c7fc9520
clean-up "nolint" comments, remove unused ones
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
  no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-12 14:40:59 +02:00
Sebastiaan van Stijn
d215725136
pkg/cri/(server|sbserver): criService.getTLSConfig() add TODO to verify nolint
This `//nolint`  was added in f5c7ac9272
to suppress warnings about the `NameToCertificate` function being deprecated:

    // Deprecated: NameToCertificate only allows associating a single certificate
    // with a given name. Leave that field nil to let the library select the first
    // compatible chain from Certificates.

Looking at that, it was deprecated in Go 1.14 through
eb93c684d4
(https://go-review.googlesource.com/c/go/+/205059), which describes:

    crypto/tls: select only compatible chains from Certificates

    Now that we have a full implementation of the logic to check certificate
    compatibility, we can let applications just list multiple chains in
    Certificates (for example, an RSA and an ECDSA one) and choose the most
    appropriate automatically.

    NameToCertificate only maps each name to one chain, so simply deprecate
    it, and while at it simplify its implementation by not stripping
    trailing dots from the SNI (which is specified not to have any, see RFC
    6066, Section 3) and by not supporting multi-level wildcards, which are
    not a thing in the WebPKI (and in crypto/x509).

We should at least have a comment describing why we are ignoring this, but preferably
review whether we should still use it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-12 14:40:11 +02:00
Ed Bartosh
643dc16565 improve CDI logging
Added logging of found CDI devices.
Fixed test failures caused by the change.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Ed Bartosh
8ed910c46a CDI: configure registry on start
Currently CDI registry is reconfigured on every
WithCDI call, which is a relatively heavy operation.

This happens because cdi.GetRegistry(cdi.WithSpecDirs(cdiSpecDirs...))
unconditionally reconfigures the registry (clears fs notify watch,
sets up new watch, rescans directories).

Moving configuration to the criService.initPlatform should result
in performing registry configuration only once on the service start.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Ed Bartosh
eec7a76ecd move WithCDI to pkg/cri/opts
As WithCDI is CRI-only API it makes sense to move it
out of oci module.

This move can also fix possible issues with this API when
CRI plugin is disabled.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-10-12 13:45:20 +03:00
Phil Estes
9b0ab84449
Merge pull request #7481 from qiutongs/netns-fix
Update container with sandbox metadata after NetNS is created
2022-10-10 15:48:54 -07:00
Qiutong Song
b41d6f40bb Update container with sandbox metadata after NetNS is created
Signed-off-by: Qiutong Song <songqt01@gmail.com>
2022-10-09 01:14:08 +00:00
Akihiro Suda
70fbedc217
archive: add WithSourceDateEpoch() for whiteouts
This makes diff archives to be reproducible.

The value is expected to be passed from CLI applications via the $SOUCE_DATE_EPOCH env var.

See https://reproducible-builds.org/docs/source-date-epoch/
for the $SOURCE_DATE_EPOCH specification.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-10-08 08:45:03 +09:00
wanglei01
a59ecc50e3 CRI: implement Controller.Delete for SandboxAPI
Signed-off-by: WangLei <wllenyj@linux.alibaba.com>
2022-09-30 16:55:54 +08:00
Derek McGowan
1cc38f8df7
Merge pull request #5904 from qiutongs/ip-leakage-fix 2022-09-29 18:14:35 -07:00
Qiutong Song
4f4aad057d Persist container and sandbox if resource cleanup fails, like teardownPodNetwork
Signed-off-by: Qiutong Song <songqt01@gmail.com>
2022-09-27 14:38:41 +00:00
Maksym Pavlenko
39f7cd73e7
Merge pull request #7405 from kzys/cri-fuzz
Refactor CRI fuzzers
2022-09-22 16:55:27 -07:00
Maksym Pavlenko
23b545232c
Merge pull request #7417 from ruiwen-zhao/grpc_code
Set grpc code for unimplemented cri-api methods
2022-09-22 12:12:34 -07:00
Phil Estes
8f95bac049
Merge pull request #7401 from wllenyj/sandbox_stop
Sandbox API: implement Controller.Wait and Controller.Stop
2022-09-22 14:33:52 -04:00
ruiwen-zhao
c6f571fc7d Set grpc code for unimplemented cri-api methods
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-09-22 07:24:48 +00:00
wanglei01
82890dd290 CRI: implement Controller.Stop for SandboxAPI
Signed-off-by: WangLei <wllenyj@linux.alibaba.com>
2022-09-22 14:38:52 +08:00
wanglei01
927906992f CRI: implement Controller.Wait for SandboxAPI
Rework sandbox monitoring, we should rely on Controller.Wait instead of
CRIService.StartSandboxExitMonitor

Signed-off-by: WangLei <wllenyj@linux.alibaba.com>
2022-09-22 14:38:45 +08:00
Ed Bartosh
e22a7a3833 reference CDI configuration details
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-09-21 11:25:28 +03:00
Samuel Karp
c8010b9cbe
sbserver: return resources in ContainerStatus
Port of b7b1200dd3 to sbserver

Signed-off-by: Samuel Karp <samuelkarp@google.com>
2022-09-20 18:38:09 -07:00
Kazuyoshi Kato
a37c64b20c Refactor CRI fuzzers
pkg/cri/sbserver/cri_fuzzer.go and pkg/cri/server/cri_fuzzer.go were
mostly the same.

This commit merges them together and move the unified fuzzer to
contrib/fuzz again to sort out dependencies. pkg/cri/ shouldn't consume
cmd/.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-09-19 22:14:11 +00:00
Phil Estes
a1e4a94694
Merge pull request #7393 from Iceber/skip_verify
remotes/docker/config: Skipping TLS verification for localhost
2022-09-19 10:53:56 -04:00
Iceber Gu
3cfde732e1 remotes/docker/config: Skipping TLS verification for localhost
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2022-09-13 17:40:23 +08:00
Kevin Parsons
de509c0682
Merge pull request #6901 from dcantah/add-wcowhyp-runtime
windows: Add runhcs-wcow-hypervisor runtimeclass to the default config
2022-09-08 10:53:12 -07:00
lengrongfu
3c0e6c40ad feat: upgrade registry.k8s.io/pause version
Signed-off-by: rongfu.leng <1275177125@qq.com>
2022-09-06 15:59:20 +08:00
Abirdcfly
dcfaa30ba2 chore: remove duplicate word in comments
Signed-off-by: Abirdcfly Fu <fp544037857@gmail.com>
2022-08-29 13:05:32 +08:00
Samuel Karp
36d0cfd0fd
Merge pull request #6517 from ruiwen-zhao/return-resource 2022-08-24 14:01:30 -07:00
ruiwen-zhao
b7b1200dd3 ContainerStatus to return container resources
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-08-24 19:08:06 +00:00
Paco Xu
9525b3148a migrate from k8s.gcr.io to registry.k8s.io
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-08-24 13:46:46 +08:00
Daniel Canter
f0036cb9dc windows: Add runhcs-wcow-hypervisor runtimeclass to the default config
As part of the effort of getting hypervisor isolated windows container
support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor
handler for the default config. This sets the correct SandboxIsolation
value that the Windows shim uses to differentiate process vs. hypervisor
isolation. This change additionally sets the wcow-process runtime to
passthrough io.microsoft.container* annotations and the hypervisor runtime
to accept io.microsoft.virtualmachine* annotations.

Note that for K8s users this runtime handler will need to be configured by
creating the corresponding RuntimeClass resources on the cluster as it's
not the default runtime.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-08-19 07:56:43 -07:00
Wei Fu
460b0533b2 pkg/cri/streaming: increase ReadHeaderTimeout
It is follow-up of #7254. This commit will increase ReadHeaderTimeout
from 3s to 30m, which prevent from unexpected timeout when the node is
running with high-load. 30 Minutes is longer enough to get close to
before what #7254 changes.

And ideally, we should allow user to configure the streaming server if
the users want this feature.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-08-18 07:42:12 +08:00
ruiwen-zhao
6e4b6830f1 Update CRI-API
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2022-08-10 03:55:51 +00:00
Maksym Pavlenko
ca3b9b50fe Run gofmt 1.19
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-08-04 18:18:33 -07:00
Maksym Pavlenko
5cf77fc43d Add TODOs for the remaining work
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-08-04 10:29:15 -07:00
Maksym Pavlenko
aa3303b697 Update sandbox protobuf to match CRI
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-29 16:08:07 -07:00
Maksym Pavlenko
8823224174 Update controller's start response to incldue pid and labels
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-29 16:08:07 -07:00
Maksym Pavlenko
3d028308ef Cleanup CRI files
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-29 16:08:07 -07:00
Maksym Pavlenko
c085fac1e5 Move sandbox start behind controller
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-29 16:08:07 -07:00
Brian Goff
f5fb2c32d2 Regenerate protos with updated protoc-gen-go
This fixes CI issues

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-07-28 16:59:30 +00:00
Fu Wei
116af9d1cc
Merge pull request #7207 from zouyee/relabel
replace with selinux label
2022-07-28 00:05:31 +08:00
Derek McGowan
6acde90772
Merge pull request #7069 from fuweid/failpoint-in-runc-shimv2
test: introduce failpoint control to runc-shimv2 and cni
2022-07-26 23:12:20 -07:00
zounengren
d121efc6d8 replace with selinux label
Signed-off-by: zounengren <zouyee1989@gmail.com>
2022-07-24 20:11:16 +08:00
zounengren
20e7b399f9 prevent Server reuse after a Shutdown
Signed-off-by: zounengren <zouyee1989@gmail.com>
2022-07-24 15:55:16 +08:00
Jeff Widman
050cd58ce6 Drop deprecated ioutil
`ioutil` has been deprecated by golang. All the code in `ioutil` just
forwards functionality to code in either the `io` or `os` packages.

See https://github.com/golang/go/pull/51961 for more info.

Signed-off-by: Jeff Widman <jeff@jeffwidman.com>
2022-07-23 08:36:20 -07:00
Maksym Pavlenko
500ff95f02 Make getServicesOpts a helper
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-22 19:38:45 -07:00
Wei Fu
cbebeb9440 pkg/failpoint: add FreeBSD link and update pkg doc
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-07-22 23:25:40 +08:00
Wei Fu
1ae6e8b076 pkg/failpoint: add DelegatedEval API
Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-07-22 23:25:40 +08:00
Wei Fu
ffd59ba600 pkg/failpoint: init failpoint package
Failpoint is used to control the fail during API call when testing, especially
the API is complicated like CRI-RunPodSandbox. It can help us to test
the unexpected behavior without mock. The control design is based on freebsd
fail(9), but simpler.

REF: https://www.freebsd.org/cgi/man.cgi?query=fail&sektion=9&apropos=0&manpath=FreeBSD%2B10.0-RELEASE

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-07-22 23:25:40 +08:00
Danielle Lancashire
3125f7e1a0 cri_stats: handle missing cpu stats
Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
2022-07-22 12:10:24 +00:00
Fu Wei
badb66113c
Merge pull request #7189 from zouyee/ctx 2022-07-22 11:09:02 +08:00
Derek McGowan
24aad6dd46
Merge pull request #7182 from HeavenTonight/main
code cleanup
2022-07-20 13:09:10 -07:00
zounengren
7dc66eee64 using ContextDialer instead
Signed-off-by: zounengren <zouyee1989@gmail.com>
2022-07-20 22:53:42 +08:00
James Sturtevant
0d6881898e Refactor usageNanoCores be to used for all OSes
Signed-off-by: James Sturtevant <jstur@microsoft.com>
2022-07-19 16:49:08 -07:00
guiyong.ou
628f6ac681 code cleanup
Signed-off-by: guiyong.ou <guiyong.ou@daocloud.io>
2022-07-19 22:46:32 +08:00
Maksym Pavlenko
e69a83f356
Merge pull request #7168 from mxpv/linter
Update and align golangci-lint version
2022-07-18 12:23:06 -07:00
Mike Brown
88bcbb0361 adds a comment explaining how to disable experimental sbserver
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2022-07-15 17:00:56 -05:00
Maksym Pavlenko
3a3f43f72f Fix linter warnings
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-15 13:29:04 -07:00
Maksym Pavlenko
98a1b7ff1b Add log messages when choosing CRI server
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-14 09:12:35 -07:00
Maksym Pavlenko
2ba6353316 Change metrics namespace for sandboxed CRI to prevent panic
panic: duplicate metrics collector registration attempted

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-13 12:47:13 -07:00
Maksym Pavlenko
b8e93774c1 Enable integration tests against sandboxed CRI
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-13 12:02:06 -07:00
Maksym Pavlenko
cf5df7e4ac Fork CRI server package
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-07-13 10:54:59 -07:00
Daniel Canter
bcdc8468f8 Fix out of date comments for CRI store packages
All of the CRI store related packages all use the standard errdefs
errors now for if a key doesn't or already exists (ErrAlreadyExists,
ErrNotFound), but the comments for the methods still referenced
some unused package specific error definitions. This change just
updates the comments to reflect what errors are actually returned
and adds comments for some previously undocumented exported functions.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-07-11 13:57:39 -07:00
Derek McGowan
aee50aeac2
Merge pull request #7108 from fuweid/refactor-cri-api
pkg/cri: use marshal wrapper for version convertor
2022-06-29 13:58:15 -07:00
Wei Fu
c2703c08c9 pkg/cri: use marshal wrapper for version convertor
Use wrapper for ReopenContainerLog v1alpha proto.

Ref: #5619

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-06-29 22:20:47 +08:00
Kazuyoshi Kato
66cc0fc879 Copy FuzzCRI from cncf/cncf-fuzzing
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-06-27 22:54:25 +00:00
Kazuyoshi Kato
57200edf25 Use testing.F on FuzzParseProcPIDStatus
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-06-15 14:56:20 +00:00
wllenyj
42a386c816 CRI: change the /dev/shm mount options in Sandbox.
All containers except the pause container, mount `/dev/shm" with flags
`nosuid,nodev,noexec`. So change mount options for pause container to
keep consistence.
This also helps to solve issues of failing to mount `/dev/shm` when
pod/container level user namespace is enabled.

Fixes: #6911

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
2022-06-14 10:45:06 +08:00
wllenyj
a62a95789c CRI: remove default /dev/shm mount in Sandbox.
This's an optimization to get rid of redundant `/dev/shm" mounts for pause container.
In `oci.defaultMounts`, there is a default `/dev/shm` mount which is redundant for
pause container.

Fixes: #6911

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Lei Wang  <wllenyj@linux.alibaba.com>
2022-06-14 10:45:06 +08:00
Maksym Pavlenko
e71ffddb6b
Merge pull request #7042 from samuelkarp/freebsd-unit-tests
Port (some) unit tests to FreeBSD
2022-06-10 15:05:52 -07:00
Kazuyoshi Kato
4ec6a379c0
Merge pull request #6918 from dcantah/windows-snapshotter-cleanup
Windows snapshotter touch ups and new functionality
2022-06-10 11:08:18 -07:00
Samuel Karp
42e019e634
cri/server: Disable tests on FreeBSD
The TestPodAnnotationPassthroughContainerSpec test and the
TestContainerAnnotationPassthroughContainerSpec test both depend on a
platform-specific implementation of criService.containerSpec, which is
unimplemented on FreeBSD.

The TestSandboxContainerSpec depends on a platform-specific
implementation oc criService.sandboxContainerSpec, which is
unimplemented on FreeBSD.

Signed-off-by: Samuel Karp <me@samuelkarp.com>
2022-06-09 18:54:10 -07:00
Shane Jennings
6190b0f04b
Correct spelling mistake ("sanbdox" to "sandbox")
Signed-off-by: Shane Jennings <superzinbo@gmail.com>
2022-06-07 10:55:15 +01:00
Daniel Canter
44e12dc5d8 Windows snapshotter touch ups and new functionality
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.

Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).

With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new `snapshotterOpts()` function that will return platform specific
snapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.

So all in all this change:
- Gets rid of code to pass CRI annotations as labels down to snapshotters.

- Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.

- Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but non-CRI clients can still use the new label/deprecated
label as usual.

- Add test to cri integration suite to validate expanding the rootfs size.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-06-06 14:57:07 -07:00
Derek McGowan
c1bcabb454
Merge pull request from GHSA-5ffw-gxpp-mxpf
Limit the response size of ExecSync
2022-06-06 10:19:23 -07:00
Kazuyoshi Kato
40aa4f3f1b
Implicitly discard the input to drain the reader
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-06-06 09:57:13 -07:00
Phil Estes
2b661b890f
Merge pull request #6899 from shuaichang/ISSUE6657-support-runtime-snapshotter
Support runtime level snapshotter for issue 6657
2022-06-03 10:04:53 +02:00
shuaichang
7b9f1d4058 Added support for runtime level snapshotter, issue 6657
Signed-off-by: shuaichang <shuai.chang@databricks.com>

Updated annotation name
2022-06-02 16:29:59 -07:00
Fu Wei
aa0aaa4947
Merge pull request #7009 from mikebrow/update-gocni 2022-06-02 11:09:46 +08:00
Mike Brown
e3b4d750db update go-cni/for cni update fixing plugins that don't respond with version
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2022-06-01 17:20:18 -05:00
Kazuyoshi Kato
c149e6c2ea
Merge pull request #6996 from dcantah/hpc-validations
Add validations for Windows HostProcess CRI configs
2022-06-01 11:37:12 -07:00
Kazuyoshi Kato
fcd0c86c70
Merge pull request #7007 from dmcgowan/move-docker-sort
Move docker reference logic to reference/docker package
2022-06-01 11:33:52 -07:00
Phil Estes
5bc2d2e429
Merge pull request #7003 from pacoxu/pause-3.7
promote pause image to 3.7 (sync with kube v1.24)
2022-06-01 05:59:14 -04:00
Derek McGowan
8ed54849a6
Move docker reference logic to reference/docker package
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-05-31 22:40:49 -07:00
Mike Brown
8c27ce4193
Merge pull request #6993 from mxpv/images
CRI: cleanup cri/store package
2022-05-31 20:38:43 -05:00
Kazuyoshi Kato
49ca87d727 Limit the response size of ExecSync
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-05-31 22:21:35 +00:00
Paco Xu
1cf6f20320 promote pause image to 3.7
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-05-30 15:08:28 +08:00
Daniel Canter
b5e1b8f619 Use t.Run for /pkg/cri tests
A majority of the tests in /pkg/cri are testing/validating multiple
things per test (generally spec or options validations). This flow
lends itself well to using *testing.T's Run method to run each thing
as a subtest so `go test` output can actually display which subtest
failed/passed.

Some of the tests in the packages in pkg/cri already did this, but
a bunch simply logged what sub-testcase was currently running without
invoking t.Run.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-05-29 18:32:09 -07:00
Maksym Pavlenko
b572a82ad8 CRI: Remove deprecated error types and update error msg
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-28 13:53:28 -07:00
Daniel Canter
978ff393d2 Add validations for Windows HostProcess CRI configs
HostProcess containers require every container in the pod to be a
host process container and have the corresponding field set. The Kubelet
usually enforces this so we'd error before even getting here but we recently
found a bug in this logic so better to be safe than sorry.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2022-05-27 21:17:07 -07:00
Maksym Pavlenko
688b30cf52 CRI: Move truncindex to pkg
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-26 13:02:45 -07:00
Maksym Pavlenko
e44335800e CRI: Move reference sorting to reference package
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-26 12:52:36 -07:00
Maksym Pavlenko
b5366f8d7e CRI: Retrieve image spec on client
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-26 12:38:55 -07:00
AllenZMC
eaec6530d7 fix some confusing typos
Signed-off-by: AllenZMC <zhongming.chang@daocloud.io>
2022-05-17 23:53:36 +08:00
Akihiro Suda
42584167b7
Officially deprecate Schema 1
Schema 1 has been substantially deprecated since circa. 2017 in favor of Schema 2 introduced in Docker 1.10 (Feb 2016)
and its successor OCI Image Spec v1, but we have not officially deprecated Schema 1.

One of the reasons was that Quay did not support Schema 2 so far, but it is reported that Quay has been
supporting Schema 2 since Feb 2020 (moby/buildkit issue 409).

This PR deprecates pulling Schema 1 images but the feature will not be removed before containerd 2.0.
Pushing Schema 1 images was never implemented in containerd (and its consumers such as BuildKit).

Docker/Moby already disabled pushing Schema 1 images in Docker 20.10 (moby/moby PR 41295),
but Docker/Moby has not yet disabled pulling Schema 1 as containerd has not yet deprecated Schema 1.
(See the comments in moby/moby PR 42300.)
Docker/Moby is expected to disable pulling Schema 1 images in future after this deprecation.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2022-05-02 19:08:38 +09:00
Mike Brown
6b35307594
Merge pull request #5490 from askervin/5Bu_blockio
Support for cgroups blockio
2022-04-29 10:07:56 -05:00
Antti Kervinen
10576c298e cri: support blockio class in pod and container annotations
This patch adds support for a container annotation and two separate
pod annotations for controlling the blockio class of containers.

The container annotation can be used by a CRI client:
  "io.kubernetes.cri.blockio-class"

Pod annotations specify the blockio class in the K8s pod spec level:
  "blockio.resources.beta.kubernetes.io/pod"
  (pod-wide default for all containers within)

  "blockio.resources.beta.kubernetes.io/container.<container_name>"
  (container-specific overrides)

Correspondingly, this patch adds support for --blockio-class and
--blockio-config-file to ctr, too.

This implementation follows the resource class annotation pattern
introduced in RDT and merged in commit 893701220.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2022-04-29 11:44:09 +03:00
Kazuyoshi Kato
29b9379560 make protos
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-27 21:31:16 +00:00
Kazuyoshi Kato
fcba486366 Remove gogo from .proto files
While gogo isn't actually used, it is still referenced from .proto files
and its corresponding Go package is imported from the auto-generated
files.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-27 20:27:55 +00:00
Kazuyoshi Kato
7bd42d226a
Merge pull request #6856 from kangclzjc/container-remove-dup-20220426
remove duplicate
2022-04-27 09:32:08 -07:00
Derek McGowan
6e0231f992
Merge pull request #6150 from fuweid/support-4984
feature: support image pull progress timeout
2022-04-26 12:15:09 -07:00
Wei Fu
00d102da9f feature: support image pull progress timeout
Kubelet sends the PullImage request without timeout, because the image size
is unknown and timeout is hard to defined. The pulling request might run
into 0B/s speed, if containerd can't receive any packet in that connection.
For this case, the containerd should cancel the PullImage request.

Although containerd provides ingester manager to track the progress of pulling
request, for example `ctr image pull` shows the console progress bar, it needs
more CPU resources to open/read the ingested files to get status.

In order to support progress timeout feature with lower overhead, this
patch uses http.RoundTripper wrapper to track active progress. That
wrapper will increase active-request number and return the
countingReadCloser wrapper for http.Response.Body. Each bytes-read
can be count and the active-request number will be descreased when the
countingReadCloser wrapper has been closed. For the progress tracker,
it can check the active-request number and bytes-read at intervals. If
there is no any progress, the progress tracker should cancel the
request.

NOTE: For each blob data, the containerd will make sure that the content
writer is opened before sending http request to the registry. Therefore, the
progress reporter can rely on the active-request number.

fixed: #4984

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-04-27 00:02:27 +08:00
Fu Wei
0d696d2f4b
Merge pull request #6749 from dmcgowan/unpacker-interface 2022-04-26 20:54:51 +08:00
Kang.Zhang
fceab7f4c4 remove duplicate
Signed-off-by: Kang.Zhang <Kang.zhang@intel.com>
2022-04-26 10:44:45 +08:00
Derek McGowan
0e6c7bf931
Fix undefined error in use of errors package
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-25 15:21:21 -07:00
Derek McGowan
3dbd6a2498
Merge pull request #6841 from kzys/proto-upgrade-6
Migrate off from github.com/gogo/protobuf
2022-04-25 15:12:51 -07:00
Kazuyoshi Kato
f140400c0e
Merge pull request #5686 from dtnyn/issue-5679
Add flag to allow oci.WithAllDevicesAllowed on PrivilegedWithoutHostDevices
2022-04-25 11:44:01 -07:00
Kazuyoshi Kato
7a4f81d8ba Fix tests
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-22 15:41:05 +00:00
Kazuyoshi Kato
9dbe000a38 make protos
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-22 15:31:53 +00:00
Kazuyoshi Kato
e3db7de8f5 Remove gogo/protobuf and adjust types
This commit migrates containerd/protobuf from github.com/gogo/protobuf
to google.golang.org/protobuf and adjust types. Proto-generated structs
cannot be passed as values.

Fixes #6564.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-22 15:31:53 +00:00
Henry Wang
8710d4d014 cri: close fifos when container is deleted
Signed-off-by: Henry Wang <henwang@amazon.com>
2022-04-21 21:46:50 +00:00
Kazuyoshi Kato
237ef0de9b Remove all gogoproto extensions
This commit removes the following gogoproto extensions;

- gogoproto.nullable
- gogoproto.customename
- gogoproto.unmarshaller_all
- gogoproto.stringer_all
- gogoproto.sizer_all
- gogoproto.marshaler_all
- gogoproto.goproto_unregonized_all
- gogoproto.goproto_stringer_all
- gogoproto.goproto_getters_all

None of them are supported by Google's toolchain (see #6564).

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-20 07:23:28 +00:00
Derek McGowan
39692e7672
unpack: return error when no platforms defined
Require platforms to be non-empty to avoid no-op unpack

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-19 18:21:57 -07:00
Derek McGowan
8017daa12d
Add unpack interface to be used by client
Move client unpacker to pkg/unpack

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-19 18:21:57 -07:00
Kazuyoshi Kato
88c0c7201e Consolidate gogo/protobuf dependencies under our own protobuf package
This would make gogo/protobuf migration easier.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-19 15:53:36 +00:00
Kazuyoshi Kato
80b825ca2c Remove gogoproto.stdtime
This commit removes gogoproto.stdtime, since it is not supported by
Google's official toolchain
(see https://github.com/containerd/containerd/issues/6564).

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-04-19 13:39:30 +00:00
Derek McGowan
fe8da6dcaf
Move lease manager plugin to separate package
Create lease plugin type to separate lease manager from services plugin.
This allows other service plugins to depend on the lease manager.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-04-15 11:08:47 -07:00
Derek McGowan
98260e1b18
Merge pull request #6806 from mikebrow/netns-hardening
check for duplicate nspath possibilities
2022-04-14 15:02:44 -07:00
Mike Brown
147f0a7e02 check for duplicate nspath possibilities
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2022-04-14 18:33:19 +00:00
Andrey Klimentyev
5f3ce9512b Do not append []string{""} to command to preserve Docker compatibility
Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
2022-04-13 13:29:49 +03:00
Eric Lin
a5dfbfcf5a cri: load sandboxes/containers/images in parallel
Parallelizing them decreases loading duration.

Time to complete recover():
* Without competing IOs + without opt: 21s
* Without competing IOs + with opt: 14s
* Competing IOs + without opt: 3m44s
* Competing IOs + with opt: 33s

Signed-off-by: Eric Lin <linxiulei@gmail.com>
2022-04-09 13:01:14 +00:00
Ed Bartosh
ff5c55847a move CDI calls to the linux-only code
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-04-06 13:10:59 +03:00
Ed Bartosh
c9b4ccf83e add configuration for CDI
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-04-06 13:10:54 +03:00
Ed Bartosh
aed0538dac cri: implement CDI device injection
Extract the names of requested CDI devices and update the OCI
Spec according to the corresponding CDI device specifications.

CDI devices are requested using container annotations in the
cdi.k8s.io namespace. Once CRI gains dedicated fields for CDI
injection the snippet for extracting CDI names will need an
update.

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2022-04-06 13:07:54 +03:00
Wei Fu
8113758568 CRI: improve image pulling performance
Background:

With current design, the content backend uses key-lock for long-lived
write transaction. If the content reference has been marked for write
transaction, the other requestes on the same reference will fail fast with
unavailable error. Since the metadata plugin is based on boltbd which
only supports single-writer, the content backend can't block or handle
the request too long. It requires the client to handle retry by itself,
like OpenWriter - backoff retry helper. But the maximum retry interval
can be up to 2 seconds. If there are several concurrent requestes fo the
same image, the waiters maybe wakeup at the same time and there is only
one waiter can continue. A lot of waiters will get into sleep and we will
take long time to finish all the pulling jobs and be worse if the image
has many more layers, which mentioned in issue #4937.

After fetching, containerd.Pull API allows several hanlers to commit
same ChainID snapshotter but only one can be done successfully. Since
unpack tar.gz is time-consuming job, it can impact the performance on
unpacking for same ChainID snapshotter in parallel.

For instance, the Request 2 doesn't need to prepare and commit, it
should just wait for Request 1 finish, which mentioned in pull
request #6318.

```text
	Request 1	Request 2

	Prepare
	   |
	   |
	   |
	   |		Prepare
	Commit		   |
			   |
			   |
			   |
			Commit(failed on exist)
```

Both content backoff retry and unnecessary unpack impacts the performance.

Solution:

Introduced the duplicate suppression in fetch and unpack context. The
deplicate suppression uses key-mutex and single-waiter-notify to support
singleflight. The caller can use the duplicate suppression in different
PullImage handlers so that we can avoid unnecessary unpack and spin-lock
in OpenWriter.

Test Result:

Before enhancement:

```bash
➜  /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20
crictl pull localhost:5000/redis:latest (x20) takes ...

real	1m6.172s
user	0m0.268s
sys	0m0.193s

docker pull localhost:5000/redis:latest (x20) takes ...

real	0m1.324s
user	0m0.441s
sys	0m0.316s

➜  /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20
crictl pull localhost:5000/golang:latest (x20) takes ...

real	1m47.657s
user	0m0.284s
sys	0m0.224s

docker pull localhost:5000/golang:latest (x20) takes ...

real	0m6.381s
user	0m0.488s
sys	0m0.358s
```

With this enhancement:

```bash
➜  /tmp sudo bash testing.sh "localhost:5000/redis:latest" 20
crictl pull localhost:5000/redis:latest (x20) takes ...

real	0m1.140s
user	0m0.243s
sys	0m0.178s

docker pull localhost:5000/redis:latest (x20) takes ...

real	0m1.239s
user	0m0.463s
sys	0m0.275s

➜  /tmp sudo bash testing.sh "localhost:5000/golang:latest" 20
crictl pull localhost:5000/golang:latest (x20) takes ...

real	0m5.546s
user	0m0.217s
sys	0m0.219s

docker pull localhost:5000/golang:latest (x20) takes ...

real	0m6.090s
user	0m0.501s
sys	0m0.331s
```

Test Script:

localhost:5000/{redis|golang}:latest is equal to
docker.io/library/{redis|golang}:latest. The image is hold in local registry
service by `docker run -d -p 5000:5000 --name registry registry:2`.

```bash

image_name="${1}"
pull_times="${2:-10}"

cleanup() {
  ctr image rmi "${image_name}"
  ctr -n k8s.io image rmi "${image_name}"
  crictl rmi "${image_name}"
  docker rmi "${image_name}"
  sleep 2
}

crictl_testing() {
  for idx in $(seq 1 ${pull_times}); do
    crictl pull "${image_name}" > /dev/null 2>&1 &
  done
  wait
}

docker_testing() {
  for idx in $(seq 1 ${pull_times}); do
    docker pull "${image_name}" > /dev/null 2>&1 &
  done
  wait
}

cleanup > /dev/null 2>&1

echo 3 > /proc/sys/vm/drop_caches
sleep 3
echo "crictl pull $image_name (x${pull_times}) takes ..."
time crictl_testing
echo

echo 3 > /proc/sys/vm/drop_caches
sleep 3
echo "docker pull $image_name (x${pull_times}) takes ..."
time docker_testing
```

Fixes: #4937
Close: #4985
Close: #6318

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2022-04-06 07:14:18 +08:00
Maksym Pavlenko
871b6b6a9f Use testify
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-04-01 18:17:58 -07:00
Fu Wei
d394e00c7e
Merge pull request #6738 from zhsj/fix-test-msg
Fix error message in TestNewBinaryIO
2022-03-25 23:40:06 +08:00
Phil Estes
3633cae64b
Merge pull request #6706 from kzys/typeurl-upgrade
Use typeurl.Any instead of github.com/gogo/protobuf/types.Any
2022-03-25 10:38:46 -04:00
Shengjing Zhu
2689432bfa Fix error message in TestNewBinaryIO
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-03-25 11:56:21 +08:00
Kazuyoshi Kato
96b16b447d Use typeurl.Any instead of github.com/gogo/protobuf/types.Any
This commit upgrades github.com/containerd/typeurl to use typeurl.Any.
The interface hides gogo/protobuf/types.Any from containerd's Go client.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2022-03-24 20:50:07 +00:00
Fu Wei
e7cba85e1c
Merge pull request #6323 from jepio/jepio/fix-cgroupv2-oom-event 2022-03-24 15:51:28 +08:00
Derek McGowan
551516a18d
Merge pull request from GHSA-c9cp-9c75-9v8c
Fix the Inheritable capability defaults.
2022-03-23 10:50:56 -07:00
Phil Estes
36dcc76fa9
Merge pull request #6496 from thaJeztah/deprecate_criu_opt
runtime: deprecate runc --criu / -criu-path option
2022-03-23 11:17:58 -04:00
Sebastiaan van Stijn
d2013d2c99
runtime: deprecate runc --criu / -criu-path option
runc option --criu is now ignored (with a warning), and the option will be
removed entirely in a future release. Users who need a non- standard criu
binary should rely on the standard way of looking up binaries in $PATH.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-23 14:42:43 +01:00
Amit Barve
bfde58e3cd Bug fix for mount path handling
Currently when handling 'container_path' elements in container mounts we simply call
filepath.Clean on those paths. However, filepath.Clean adds an extra '.' if the path is a
simple drive letter ('E:' or 'Z:' etc.). These type of paths cause failures (with incorrect
parameter error) when creating containers via hcsshim. This commit checks for such paths
and doesn't call filepath.Clean on them.
It also adds a new check to error out if the destination path is a C drive and moves the
dst path checks out of the named pipe condition.

Signed-off-by: Amit Barve <ambarve@microsoft.com>
2022-03-21 09:40:19 -07:00
Phil Estes
ee49c4d557
Add nolint:staticcheck to platform-specific calls
The linter on platforms that have a hardcoded response complains about
"if xyz == nil" checks; ignore those.

Signed-off-by: Phil Estes <estesp@amazon.com>
2022-03-17 18:24:00 -04:00
Fu Wei
d9797673b0
Merge pull request #6593 from qiutongs/improve-container-mount
Make the temp mount as ready only in container WithVolumes
2022-03-18 00:03:28 +08:00
Eng Zer Jun
18ec2761c0
test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-15 14:03:50 +08:00
Paul "TBBle" Hampson
39d52118f5 Plumb CRI Devices through to OCI WindowsDevices
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID
-- hcsshim's CRI test suite generates this form

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
2022-03-12 08:16:43 +11:00
Derek McGowan
8acbb27647
Merge pull request from GHSA-crp2-qrr5-8pq7
Clean image volume path
2022-03-02 10:03:17 -08:00
Shengjing Zhu
352a8f49f7 cri: relax test for system without hugetlb
These unit tests don't check hugetlb. However by setting
TolerateMissingHugetlbController to false, these tests can't
be run on system without hugetlb (e.g. Debian buildd).

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-28 01:38:58 +08:00
Qiutong Song
ec90efbe99 Make the temp mount as ready only in container WithVolumes
Signed-off-by: Qiutong Song <songqt01@gmail.com>
2022-02-25 17:53:30 -08:00
Shengjing Zhu
ea3d2e6433 go.mod: update to github.com/tchap/go-patricia/v2 v2.3.1
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-26 05:04:55 +08:00
Phil Estes
2b2372d43e
Merge pull request #6337 from thaJeztah/bump_go_restful
go.mod: update to github.com/emicklei/go-restful/v3 v3.7.3
2022-02-22 17:33:37 -05:00
Shengjing Zhu
f4f41296c2 Replace golang.org/x/net/context with std library
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-22 02:27:05 +08:00
Sebastiaan van Stijn
481fb923c5
go.mod: update to github.com/emicklei/go-restful/v3 v3.7.3
full diff: https://github.com/emicklei/go-restful/compare/v2.9.5...v3.7.3

- Switch to using go modules
- Add check for wildcard to fix CORS filter
- Add check on writer to prevent compression of response twice
- Add OPTIONS shortcut WebService receiver
- Add Route metadata to request attributes or allow adding attributes to routes
- Add wroteHeader set
- Enable content encoding on Handle and ServeHTTP
- Feat: support google custom verb
- Feature: override list of method allowed without content-type
- Fix Allow header not set on '405: Method Not Allowed' responses
- Fix Go 1.15: conversion from int to string yields a string of one rune
- Fix WriteError return value
- Fix: use request/response resulting from filter chain
- handle path params with prefixes and suffixes
- HTTP response body was broken, if struct to be converted to JSON has boolean value
- List available representations in 406 body
- Support describing response headers
- Unwrap function in filter chain + remove unused dispatchWithFilters

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-02-18 21:54:27 +01:00
ruiwen-zhao
fb0b8d6177 Use fs.RootPath when mounting volumes
Signed-off-by: Ruiwen Zhao <ruiwen@google.com>
2022-02-17 19:20:00 +00:00
Derek McGowan
c0f8188469
Update go-cni to v1.1.2
Fixes panic when exec is nil

Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-02-10 12:40:51 -08:00
Fu Wei
6a628b64ac
Merge pull request #6514 from marquiz/fixes/rdt 2022-02-08 09:31:49 +08:00
Markus Lehtonen
9b1fb82584 cri: fix handling of ignore_rdt_not_enabled_errors config option
We were not properly ignoring errors from
gorestrl.rdt.ContainerClassFromAnnotations() causing the config option
to be ineffective, in practice.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-02-04 13:54:03 +02:00
Jeremi Piotrowski
7275411ec8 cgroup2: monitor OOMKill instead of OOM to prevent missing container OOM events
With the cgroupv2 configuration employed by Kubernetes, the pod cgroup (slice)
and container cgroup (scope) will both have the same memory limit applied. In
that situation, the kernel will consider an OOM event to be triggered by the
parent cgroup (slice), and increment 'oom' there. The child cgroup (scope) only
sees an oom_kill increment. Since we monitor child cgroups for oom events,
check the OOMKill field so that we don't miss events.

This is not visible when running containers through docker or ctr, because they
set the limits differently (only container level). An alternative would be to
not configure limits at the pod level - that way the container limit will be
hit and the OOM will be correctly generated. An interesting consequence is that
when spawning a pod with multiple containers, the oom events also work
correctly, because:

a) if one of the containers has no limit, the pod has no limit so OOM events in
   another container report correctly.
b) if all of the containers have limits then the pod limit will be a sum of
   container events, so a container will be able to hit its limit first.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2022-02-03 13:39:16 +01:00
Jeremi Piotrowski
821c961c86 pkg/oom/v2: handle EventChan routine shutdown quietly
When the cgroup is removed, EventChan is closed (this was pulled in by
8d69c041c5). This results in a nil error
being received. Don't log an error in that case but instead return.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2022-02-03 13:20:46 +01:00
Andrew G. Morgan
6906b57c72
Fix the Inheritable capability defaults.
The Linux kernel never sets the Inheritable capability flag to
anything other than empty. Non-empty values are always exclusively
set by userspace code.

[The kernel stopped defaulting this set of capability values to the
 full set in 2000 after a privilege escalation with Capabilities
 affecting Sendmail and others.]

Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
2022-02-01 13:55:46 -08:00
Derek McGowan
4e9e14c2b6
Fix rdt build tags for go 1.16
Signed-off-by: Derek McGowan <derek@mcg.dev>
2022-01-19 11:09:29 -08:00
Takumasa Sakao
18592b2f5a Fix wrong log message
Signed-off-by: Takumasa Sakao <tsakao@zlab.co.jp>
2022-01-09 16:01:23 +09:00
haoyun
bbe46b8c43 feat: replace github.com/pkg/errors to errors
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: zounengren <zouyee1989@gmail.com>
2022-01-07 10:27:03 +08:00
Derek McGowan
644a01e13b
Merge pull request from GHSA-mvff-h3cj-wj9c
only relabel cri managed host mounts
2022-01-05 09:30:58 -08:00
Markus Lehtonen
9c2e3835fa cri: add ignore_rdt_not_enabled_errors config option
Enabling this option effectively causes RDT class of a container to be a
soft requirement. If RDT support has not been enabled the RDT class
setting will not have any effect.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-01-04 09:27:54 +02:00
Markus Lehtonen
f4a191917b cri: annotations for controlling RDT class
Use goresctrl for parsing container and pod annotations related to RDT.

In practice, from the users' point of view, this patchs adds support for
a container annotation and two separate pod annotations for controlling
the RDT class of containers.

Container annotation can be used by a CRI client:
  "io.kubernetes.cri.rdt-class"

Pod annotations for specifying the RDT class in the K8s pod spec level:
  "rdt.resources.beta.kubernetes.io/pod"
  (pod-wide default for all containers within)

  "rdt.resources.beta.kubernetes.io/container.<container_name>"
  (container-specific overrides)

Annotations are intended as an intermediate step before the CRI API
supports RDT.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-01-04 09:27:54 +02:00
Derek McGowan
2c9d80aba5
Merge pull request #6372 from fidencio/wip/seutil-fix-container_kvm_t-type-detection
seutil: Fix setting the "container_kvm_t" label
2021-12-15 10:35:04 -08:00
Phil Estes
949db57213
Merge pull request #6320 from endocrimes/dani/cri-swap
cri: add support for configuring swap
2021-12-14 15:02:28 -05:00
Phil Estes
330961c2d5
Merge pull request #6358 from jonyhy96/feat-error
refactor: functions for error log and error return
2021-12-14 10:16:54 -05:00
Fu Wei
d47fa40d1b
Merge pull request #6021 from dmcgowan/runc-shim-plugin 2021-12-14 10:19:23 +08:00
Derek McGowan
ac531108ab
Merge pull request #6155 from egernst/cri-update-for-sandbox-sizing
CRI update for sandbox sizing
2021-12-13 16:21:30 -08:00
Fabiano Fidêncio
f1c7993311 seutil: Fix setting the "container_kvm_t" label
The ability to handle KVM based runtimes with SELinux has been added as
part of d715d00906.

However, that commit introduced some logic to check whether the
"container_kvm_t" label would or not be present in the system, and while
the intentions were good, there's two major issues with the approach:
1. Inspecting "/etc/selinux/targeted/contexts/customizable_types" is not
   the way to go, as it doesn't list the "container_kvm_t" at all.
2. There's no need to check for the label, as if the label is invalid an
   "Invalid Label" error will be returned and that's it.

With those two in mind, let's simplify the logic behind setting the
"container_kvm_t" label, removing all the unnecessary code.

Here's an output of VMM process running, considering:
* The state before this patch:
  ```
  $ containerd --version
  containerd github.com/containerd/containerd v1.6.0-beta.3-88-g7fa44fc98 7fa44fc98f
  $ kubectl apply -f ~/simple-pod.yaml
  pod/nginx created
  $ ps -auxZ | grep cloud-hypervisor
  system_u:system_r:container_runtime_t:s0 root 609717 4.0  0.5 2987512 83588 ?    Sl   08:32   0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/be9d5cbabf440510d58d89fc8a8e77c27e96ddc99709ecaf5ab94c6b6b0d4c89/clh-api.sock
  ```

* The state after this patch:
  ```
  $ containerd --version
  containerd github.com/containerd/containerd v1.6.0-beta.3-89-ga5f2113c9 a5f2113c9fc15b19b2c364caaedb99c22de4eb32
  $ kubectl apply -f ~/simple-pod.yaml
  pod/nginx created
  $ ps -auxZ | grep cloud-hypervisor
  system_u:system_r:container_kvm_t:s0:c638,c999 root 614842 14.0  0.5 2987512 83228 ? Sl 08:40   0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/f8ff838afdbe0a546f6995fe9b08e0956d0d0cdfe749705d7ce4618695baa68c/clh-api.sock
  ```

Note, the tests were performed using the following configuration snippet:
```
[plugins]
  [plugins.cri]
    enable_selinux = true
    [plugins.cri.containerd]
      [plugins.cri.containerd.runtimes]
        [plugins.cri.containerd.runtimes.kata]
           runtime_type = "io.containerd.kata.v2"
           privileged_without_host_devices = true
```

And using the following pod yaml:
```
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  runtimeClassName: kata
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
```

Fixes: #6371

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2021-12-14 00:09:17 +01:00
Alexander Minbaev
c8a009d18c add-list-stat: return container list if filter is nil
Signed-off-by: Alexander Minbaev <alexander.minbaev@ibm.com>
2021-12-13 15:09:18 -06:00
Eric Ernst
20419feaac cri, sandbox: pass sandbox resource details if available, applicable
CRI API has been updated to include a an optional `resources` field in the
LinuxPodSandboxConfig field, as part of the RunPodSandbox request.

Having sandbox level resource details at sandbox creation time will have
large benefits for sandboxed runtimes. In the case of Kata Containers,
for example, this'll allow for better support of SW/HW architectures
which don't allow for CPU/memory hotplug, and it'll allow for better
queue sizing for virtio devices associated with the sandbox (in the VM
case).

If this sandbox resource information is provided as part of the run
sandbox request, let's introduce a pattern where we will update the
pause container's runtiem spec to include this information in the
annotations field.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-12-13 08:41:41 -08:00
haoyun
c0d07094be feat: Errorf usage
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-13 14:31:53 +08:00
Michael Crosby
9b0303913f
only relabel cri managed host mounts
Co-authored-by: Samuel Karp <skarp@amazon.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
Signed-off-by: Samuel Karp <skarp@amazon.com>
2021-12-09 09:53:47 -08:00
Sebastiaan van Stijn
2d3009038c
cri/server: use consistent alias for pkg/ioutil
Consistently use cioutil to prevent it being confused for Golang's ioutil.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-12-09 17:47:22 +01:00
Danielle Lancashire
2fa4e9c0e2 cri: add support for configuring swap
Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
2021-12-02 21:25:33 +01:00
Fu Wei
69822aa936
Merge pull request #6258 from wllenyj/fix-registry-panic 2021-11-19 13:35:46 +08:00
wanglei01
5f293d9ac4 [CRI] Fix panic when registry.mirrors use localhost
When containerd use this config:

```
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
      endpoint = ["http://localhost:5000"]
```

Due to the `newTransport` function does not initialize the `TLSClientConfig` field.
Then use `TLSClientConfig` to cause nil pointer dereference

Signed-off-by: wanglei <wllenyj@linux.alibaba.com>
2021-11-19 10:56:46 +08:00
Michael Crosby
aa2733c202
Merge pull request #6170 from olljanat/default-sysctls
CRI: Support enable_unprivileged_icmp and enable_unprivileged_ports options
2021-11-18 11:37:23 -05:00
Derek McGowan
9afc778b73
Merge pull request #6111 from crosbymichael/latency-metrics
[cri] add sandbox and container latency metrics
2021-11-16 16:59:33 -08:00
Phil Estes
7758cdc09a
Merge pull request #6253 from jonyhy96/feat-rwmutex
feat: use rwmutex instead
2021-11-16 11:39:29 -05:00
Derek McGowan
6835a94707
Split runc shim into plugin components
Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-11-15 20:16:45 -08:00
Derek McGowan
6eea8f3f62
Add shutdown package
Allows shutdown to handle callbacks with similar behavior as context
cancel

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-11-15 20:16:45 -08:00
haoyun
bef792b962 feat: use rwmutex instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-11-16 11:06:40 +08:00
Derek McGowan
d055487b00
Merge pull request #6206 from mxpv/path
Allow absolute path to shim binaries
2021-11-15 18:05:48 -08:00
Olli Janatuinen
2a81c9f677 CRI: Support enable_unprivileged_icmp and enable_unprivileged_ports options
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2021-11-15 18:30:09 +02:00
Michael Crosby
6765524b73 use write lock when updating container stats
Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-11-11 15:17:48 +00:00
Maksym Pavlenko
6870f3b1b8 Support custom runtime path when launching tasks
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-11-09 13:31:46 -08:00
Michael Crosby
91bbaf6799 [cri] add sandbox and container latency metrics
These are simple metrics that allow users to view more fine grained metrics on
internal operations.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-11-09 21:07:38 +00:00
Michael Crosby
4b7cc560b2
Merge pull request #6222 from jonyhy96/add-more-description
cleanup: add more description on comment
2021-11-09 15:55:32 -05:00
haoyun
5748006337 cleanup: add more description on comment
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-11-09 19:13:37 +08:00
David Porter
2e6d5709e3 Implement CRI container and pods stats
See https://kep.k8s.io/2371

* Implement new CRI RPCs - `ListPodSandboxStats` and `PodSandboxStats`
  * `ListPodSandboxStats` and `PodSandboxStats` which return stats about
    pod sandbox. To obtain pod sandbox stats, underlying metrics are
    read from the pod sandbox cgroup parent.
  * Process info is obtained by calling into the underlying task
  * Network stats are taken by looking up network metrics based on the
    pod sandbox network namespace path
* Return more detailed stats for cpu and memory for existing container
  stats. These metrics use the underlying task's metrics to obtain
  stats.

Signed-off-by: David Porter <porterdavid@google.com>
2021-11-03 17:52:05 -07:00
Dat Nguyen
afe39bebfe add oci.WithAllDevicesAllowed flag for privileged_without_host_devices
This commit adds a flag that enable all devices whitelisting when
privileged_without_host_devices is already enabled.

Fixes #5679

Signed-off-by: Dat Nguyen <dnguyen7@atlassian.com>
2021-11-04 10:24:19 +11:00
Mike Brown
ea89788105 adds additional debug out to timebox cni setup
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-11-01 09:34:29 -05:00
zounengren
a217b5ac8f bump CNI to spec v1.0.0
Signed-off-by: zounengren <zouyee1989@gmail.com>
2021-10-22 10:58:40 +08:00
Sambhav Kothari
2a8dac12a7 Output a warning for label image labels instead of erroring
This change ignore errors during container runtime due to large
image labels and instead outputs warning. This is necessary as certain
image building tools like buildpacks may have large labels in the images
which need not be passed to the container.

Signed-off-by: Sambhav Kothari <sambhavs.email@gmail.com>
2021-10-14 19:25:48 +01:00
Claudiu Belu
2bc77b8a28 Adds Windows resource limits support
This will allow running Windows Containers to have their resource
limits updated through containerd. The CPU resource limits support
has been added for Windows Server 20H2 and newer, on older versions
hcsshim will raise an Unimplemented error.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-09-25 13:20:55 -07:00
Derek McGowan
cb6fb93af5
Merge pull request #6011 from crosbymichael/schedcore
add runc shim support for sched core
2021-10-08 10:42:16 -07:00
Derek McGowan
26ee1b1ee5
Merge pull request #4695 from crosbymichael/cri-class
[cri] Add CNI conf based on runtime class
2021-10-08 09:27:49 -07:00
Michael Crosby
e48bbe8394 add runc shim support for sched core
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.

The containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.

kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html

Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-10-08 16:18:09 +00:00
Michael Crosby
7b8a697f28
Merge pull request #6034 from claudiubelu/windows/fixes-image-volume
Fixes Windows containers with image volumes
2021-10-07 11:50:01 -04:00
Akihiro Suda
703b86533b
pkg/cap: remove an outdated comment
pkg/cap no longer depends on github.com/syndtr/gocapability

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-10-06 13:24:30 +09:00
Derek McGowan
63b7e5771e
Merge pull request #5973 from Juneezee/deprecate-ioutil
refactor: move from io/ioutil to io and os package
2021-10-01 10:52:06 -07:00
Claudiu Belu
791e175c79 Windows: Fixes Windows containers with image volumes
Currently, there are few issues that preventing containers
with image volumes to properly start on Windows.

- Unlike the Linux implementation, the Container volume mount paths
  were not created if they didn't exist. Those paths are now created.

- while copying the image volume contents to the container volume,
  the layers were not properly deactivated, which means that the
  container can't start since those layers are still open. The layers
  are now properly deactivated, allowing the container to start.

- even if the above issue didn't exist, the Windows implementation of
  mount/Mount.Mount deactivates the layers, which wouldn't allow us
  to copy files from them. The layers are now deactivated after we've
  copied the necessary files from them.

- the target argument of the Windows implementation of mount/Mount.Mount
  was unused, which means that folder was always empty. We're now
  symlinking the Layer Mount Path into the target folder.

- hcsshim needs its Container Mount Paths to be properly formated, to be
  prefixed by C:. This was an issue for Volumes defined with Linux-like
  paths (e.g.: /test_dir). filepath.Abs solves this issue.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-10-01 09:02:18 +00:00
haoyun
5c2426a7b2 cleanup: import from k8s.io/utils/clock/testing instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-09-30 23:34:56 +08:00
haoyun
6484fab1e0 cleanup: import from k8s.io/utils/clock instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-09-30 23:27:20 +08:00
zounengren
fcffe0c83a switch usage directly to errdefs.(ErrAlreadyExists and ErrNotFound)
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2021-09-24 18:26:58 +08:00
Eng Zer Jun
50da673592
refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-09-21 09:50:38 +08:00
Michael Crosby
55893b9be7 Add CNI conf based on runtime class
Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-09-17 19:05:06 +00:00
Phil Estes
f40df3d72b
Enable image config labels in ctr and CRI container creation
Signed-off-by: Phil Estes <estesp@amazon.com>
2021-09-15 15:31:19 -04:00
Phil Estes
d081457ba4
Merge pull request #5974 from claudiubelu/hanging-task-delete-fix
task delete: Closes task IO before waiting
2021-09-15 11:30:23 -04:00
Fu Wei
e1ad779107
Merge pull request #5817 from dmcgowan/shim-plugins
Add support for shim plugins
2021-09-12 18:18:20 +08:00
Fu Wei
d9f921e4f0
Merge pull request #5906 from thaJeztah/replace_os_exec 2021-09-11 10:38:53 +08:00
Phil Estes
6589876d20
Merge pull request #5964 from crosbymichael/cni-pref
add ip_pref CNI options for primary pod ip
2021-09-10 12:06:23 -04:00
Fu Wei
689a863efe
Merge pull request #5939 from scuzhanglei/privileged-device 2021-09-10 22:15:46 +08:00
Michael Crosby
1ddc54c00d
Merge pull request #5954 from claudiubelu/fix-sandbox-remove
sandbox: Allows the sandbox to be deleted in NotReady state
2021-09-10 10:12:34 -04:00
Michael Crosby
1efed43090
add ip_pref CNI options for primary pod ip
This fixes the TODO of this function and also expands on how the primary pod ip
is selected. This change allows the operator to prefer ipv4, ipv6, or retain the
ordering provided by the return results of the CNI plugins.

This makes it much more flexible for ops to configure containerd and how IPs are
set on the pod.

Signed-off-by: Michael Crosby <michael@thepasture.io>
2021-09-10 10:04:21 -04:00
scuzhanglei
756f4a3147 cri: add devices for privileged container
Signed-off-by: scuzhanglei <greatzhanglei@gmail.com>
2021-09-10 10:16:26 +08:00
Fu Wei
d58542a9d1
Merge pull request #5627 from payall4u/payall4u/cri-support-cgroup-v2 2021-09-09 23:10:33 +08:00
Claudiu Belu
55faa5e93d task delete: Closes task IO before waiting
After containerd restarts, it will try to recover its sandboxes,
containers, and images. If it detects a task in the Created or
Stopped state, it will be removed. This will cause the containerd
process it hang on Windows on the t.io.Wait() call.

Calling t.io.Close() beforehand will solve this issue.

Additionally, the same issue occurs when trying to stopp a sandbox
after containerd restarts. This will solve that case as well.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-09-07 02:17:01 -07:00
Wei Fu
2bcd6a4e88 cri: patch update image labels
The CRI-plugin subscribes the image event on k8s.io namespace. By
default, the image event is created by CRI-API. However, the image can
be downloaded by containerd API on k8s.io with the customized labels.
The CRI-plugin should use patch update for `io.cri-containerd.image`
label in this case.

Fixes: #5900

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-09-05 18:48:26 +08:00
Claudiu Belu
24cec9be56 sandbox: Allows the sandbox to be deleted in NotReady state
The Pod Sandbox can enter in a NotReady state if the task associated
with it no longer exists (it died, or it was killed). In this state,
the Pod network namespace could still be open, which means we can't
remove the sandbox, even if --force was used.

Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
2021-09-02 03:40:56 -07:00
Mike Brown
e00f87f1dc
Merge pull request #5927 from adelina-t/ws_2022_image_update
Update Pause image in tests & config
2021-08-31 16:11:57 -05:00
Adelina Tuvenie
6d3d34b85d Update Pause image in tests & config
With the introduction of Windows Server 2022, some images have been updated
to support WS2022 in their manifest list. This commit updates the test images
accordingly.

Signed-off-by: Adelina Tuvenie <atuvenie@cloudbasesolutions.com>
2021-08-31 19:42:57 +03:00
Mikko Ylinen
e0f8c04dad cri: Devices ownership from SecurityContext
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.

This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.

It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.

This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-08-30 09:30:00 +03:00
Phil Estes
af1a0908d0
Merge pull request #5865 from dcantah/windows-pod-runasusername
Add RunAsUserName functionality for the Windows pod sandbox container
2021-08-25 22:25:14 -04:00
Sebastiaan van Stijn
2ac9968401
replace uses of os/exec with golang.org/x/sys/execabs
Go 1.15.7 contained a security fix for CVE-2021-3115, which allowed arbitrary
code to be executed at build time when using cgo on Windows. This issue also
affects Unix users who have “.” listed explicitly in their PATH and are running
“go get” outside of a module or with module mode disabled.

This issue is not limited to the go command itself, and can also affect binaries
that use `os.Command`, `os.LookPath`, etc.

From the related blogpost (ttps://blog.golang.org/path-security):

> Are your own programs affected?
>
> If you use exec.LookPath or exec.Command in your own programs, you only need to
> be concerned if you (or your users) run your program in a directory with untrusted
> contents. If so, then a subprocess could be started using an executable from dot
> instead of from a system directory. (Again, using an executable from dot happens
> always on Windows and only with uncommon PATH settings on Unix.)
>
> If you are concerned, then we’ve published the more restricted variant of os/exec
> as golang.org/x/sys/execabs. You can use it in your program by simply replacing

This patch replaces all uses of `os/exec` with `golang.org/x/sys/execabs`. While
some uses of `os/exec` should not be problematic (e.g. part of tests), it is
probably good to be consistent, in case code gets moved around.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-25 18:11:09 +02:00
Fu Wei
6fa9588531
Merge pull request #5903 from AkihiroSuda/gofmt117
Run `go fmt` with Go 1.17
2021-08-24 23:01:41 +08:00
Daniel Canter
25644b4614 Add RunAsUserName functionality for the Windows Pod Sandbox Container
There was recent changes to cri to bring in a Windows section containing a
security context object to the pod config. Before this there was no way to specify
a user for the pod sandbox container to run as. In addition, the security context
is a field for field mirror of the Windows container version of it, so add the
ability to specify a GMSA credential spec for the pod sandbox container as well.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
2021-08-23 07:35:22 -07:00
payall4u
f8dfbee178 add cri test case
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-08-23 10:59:19 +08:00
payall4u
9a8bf13158 feature: add field LinuxContainerResources.Unified on cri
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-08-23 10:49:31 +08:00
Akihiro Suda
d3aa7ee9f0
Run go fmt with Go 1.17
The new `go fmt` adds `//go:build` lines (https://golang.org/doc/go1.17#tools).

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-08-22 09:31:50 +09:00
Jacob Blain Christen
c3609ff4ca cri: filter selinux xattr for image volumes
Exclude the `security.selinux` xattr when copying content from layer
storage for image volumes. This allows for the already correct label
at the target location to be applied to the copied content, thus
enabling containers to write to volumes that they implicitly expect to be
able to write to.

- Fixes containerd/containerd#5090
- See rancher/rke2#690

Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2021-08-20 23:47:24 -07:00
Phil Estes
ff2e58d114
Merge pull request #5131 from perithompson/windows-hostnetwork
Add Windows HostProcess Support
2021-08-20 14:29:37 -04:00
Kazuyoshi Kato
4dd5ca70fb script: update golangci-lint from v1.38.0 and v1.36.0 to v1.42.0
golint has been deprecated and replaced by revive since v1.41.0.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-08-19 16:27:16 -07:00
Derek McGowan
8d135d2842
Add support for shim plugins
Refactor shim v2 to load and register plugins.
Update init shim interface to not require task service implementation on
returned service, but register as plugin if it is.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-08-17 11:06:09 -07:00
Gunju Kim
1224060f89 Allow expanded DNS configuration
Signed-off-by: Gunju Kim <gjkim042@gmail.com>
2021-08-14 06:13:01 +09:00
Peri Thompson
79b369a0bb
Added windows hostProcess cni skip
Signed-off-by: Peri Thompson <perit@vmware.com>
2021-08-11 22:23:49 +01:00
Michael Crosby
218db0f9af
Merge pull request #5835 from dmcgowan/plugin-events-cleanup
Move plugin context events into separate plugin
2021-08-07 21:47:11 -04:00
Derek McGowan
0a0621bb47
Move plugin context events into separate plugin
Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-08-05 22:59:20 -07:00
Derek McGowan
6f027e38a8
Remove redundant build tags
Remove build tags which are already implied by the name of the file.
Ensures build tags are used consistently

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-08-05 22:27:46 -07:00
Derek McGowan
caf9e256b7
Merge pull request #5693 from kzys/sigrtmin
Support SIGRTMIN+n signals
2021-07-27 11:58:57 -07:00
Davanum Srinivas
494b940f14
Introduce a new go module - containerd/api for use in standalone clients
In containerd 1.5.x, we introduced support for go modules by adding a
go.mod file in the root directory. This go.mod lists all the things
needed across the whole code base (with the exception of
integration/client which has its own go.mod). So when projects that
need to make calls to containerd API will pull in some code from
containerd/containerd, the `go mod` commands will add all the things
listed in the root go.mod to the projects go.mod file. This causes
some problems as the list of things needed to make a simple API call
is enormous. in effect, making a API call will pull everything that a
typical server needs as well as the root go.mod is all encompassing.
In general if we had smaller things folks could use, that will make it
easier by reducing the number of things that will end up in a consumers
go.mod file.

Now coming to a specific problem, the root containerd go.mod has various
k8s.io/* modules listed. Also kubernetes depends on containerd indirectly
via both moby/moby (working with docker maintainers seperately) and via
google/cadvisor. So when the kubernetes maintainers try to use latest
1.5.x containerd, they will see the kubernetes go.mod ending up depending
on the older version of kubernetes!

So if we can expose just the minimum things needed to make a client API
call then projects like cadvisor can adopt that instead of pulling in
the entire go.mod from containerd. Looking at the existing code in
cadvisor the minimum things needed would be the api/ directory from
containerd. Please see proof of concept here:
github.com/google/cadvisor/pull/2908

To enable that, in this PR, we add a go.mod file in api/ directory. we
split the Protobuild.yaml into two, one for just the things in api/
directory and the rest in the root directory. We adjust various targets
to build things correctly using `protobuild` and also ensure that we
end up with the same generated code as before as well. To ensure we
better take care of the various go.mod/go.sum files, we update the
existing `make vendor` and also add a new `make verify-vendor` that one
can run locally as well in the CI.

Ideally, we would have a `containerd/client` either as a standalone repo
or within `containerd/containerd` as a separate go module. but we will
start here to experiment with a standalone api go module first.

Also there are various follow ups we can do, for example @thaJeztah has
identified two tasks we could do after this PR lands:

github.com/containerd/containerd/pull/5716#discussion_r668821396

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-07-27 07:34:59 -04:00
Kazuyoshi Kato
1d3d08026d Support SIGRTMIN+n signals
systemd uses SIGRTMIN+n signals, but containerd didn't support the signals
since Go's sys/unix doesn't support them.

This change introduces SIGRTMIN+n handling by utilizing moby/sys/signal.

Fixes #5402.

https://www.freedesktop.org/software/systemd/man/systemd.html#Signals

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-07-26 09:36:43 -07:00
Wei Fu
ac75071b49 remove pkg/cri/platforms package
The package is a duplicate of platforms. No need to maintain
pkg/cri/platforms.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2021-07-10 10:14:27 +08:00
Brian Goff
0a8802df67 Allow WithServices to use custom implementations
Before this change, for several of the services that `WithServices`
handles, only the grpc client is supported.
Now, for instance, one can use an `images.Store` directly instead of
only an `imagesapi.StoreSlient`.

Some of the methods have been renamed to satisfy the difference between
using a grpc `<Foo>Client` vs the main interface.

I did not see a good candidate for TaskService so have left that mostly
unchanged.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2021-07-09 23:30:40 +00:00
Phil Estes
cf600abecc
Merge pull request #5619 from mikebrow/cri-add-v1-proxy-alpha
[CRI] move up to CRI v1 and support v1alpha in parallel
2021-07-09 14:07:24 -04:00
Mike Brown
d1c1051927 use fu wei's suggeted interface pick for marshaling
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-07-07 15:45:45 -05:00
Mike Brown
14962dcbd2 add alpha version
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-07-06 11:40:20 -05:00
Mike Brown
a5c417ac06 move up to CRI v1 and support v1alpha in parallel
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-06-28 09:34:12 -05:00
Dan Williams
dac2543a07 sandbox: send pod UID to CNI plugins as K8S_POD_UID
CNI plugins that need to wait for network state to converge
may want to cancel waiting when a short lived pod is deleted.
However, there is a race between when kubelet asks the runtime
to create the sandbox for the pod, and when the plugin is able
request the pod object from the apiserver. It may be the case
that the plugin receives the new pod, rather than the pod
the sandbox request was initiated for.

Passing the pod UID to the plugin allows the plugin to check
whether the pod it gets from the apiserver is actually the
pod its sandbox request was started for.

Signed-off-by: Dan Williams <dcbw@redhat.com>
2021-06-22 22:53:30 -05:00
Mike Brown
560e7d4799 fixing some doc links
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-06-21 18:24:47 -05:00
Kazuyoshi Kato
1bbee573af github.com/golang/protobuf/proto is deprecated
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-06-17 10:28:48 -04:00
Quan Tian
728743eb28 Fix cleanup context of teardownPodNetwork
Similar to other deferred cleanup operations, teardownPodNetwork should
use a different context as the original context may have expired,
otherwise CNI wouldn't been invoked, leading to leak of network
resources, e.g. IP addresses.

Signed-off-by: Quan Tian <qtian@vmware.com>
2021-06-04 19:17:05 +08:00
zounengren
498bb36f67 scrub the stale TODO
Signed-off-by: zounengren <zouyee1989@gmail.com>
2021-06-01 11:22:15 +08:00
Shiming Zhang
79e3452213 update the link
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-05-20 11:56:29 +08:00
Shiming Zhang
1acca8bba3 Don't check for apparmor_parser to be present
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-05-20 11:56:29 +08:00
Phil Estes
e47400cbd2
Merge pull request #5100 from adisky/skip-tls-localHost
Skip TLS verification for localhost
2021-05-12 14:56:53 -04:00
Kevin Parsons
b0d3b35b28 windows: Use GetFinalPathNameByHandle for ResolveSymbolicLink
This change splits the definition of pkg/cri/os.ResolveSymbolicLink by
platform (windows/!windows), and switches to an alternate implementation
for Windows. This aims to fix the issue described in containerd/containerd#5405.

The previous implementation which just called filepath.EvalSymlinks has
historically had issues on Windows. One of these issues we were able to
fix in Go, but EvalSymlinks's behavior is not well specified on
Windows, and there could easily be more issues in the future, so it
seems prudent to move to a separate implementation for Windows.

The new implementation uses the Windows GetFinalPathNameByHandle API,
which takes a handle to an open file or directory and some flags, and
returns the "real" name for the object. See comments in the code for
details on the implementation.

I have tested this change with a variety of mounts and everything seems
to work as expected. Functions that make incorrect assumptions on what a
Windows path can look like may have some trouble with the \\?\ path
syntax. For instance EvalSymlinks fails when given a \\?\UNC\ path. For
this reason, the resolvePath implementation modifies the returned path
to translate to the more common form (\\?\UNC\server\share ->
\\server\share).

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
2021-05-04 11:55:11 -07:00
Mike Brown
c1a35232d8
Merge pull request #5446 from Random-Liu/fix-auth-config
Fix different registry hosts referencing the same auth config.
2021-05-04 06:21:02 -05:00
Lantao Liu
81402e4758 Fix different registry hosts referencing the same auth config.
Signed-off-by: Lantao Liu <lantaol@google.com>
2021-05-03 17:42:57 -07:00
Aditi Sharma
8014d9fee0 Skip TLS verification for localhost
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2021-05-03 10:21:54 +05:30
Mike Brown
3dad67eedb
Merge pull request #5203 from tghartland/5008-target-namespace
Support PID NamespaceMode_TARGET
2021-04-21 19:41:38 -05:00
Maksym Pavlenko
cb64dc8250
Merge pull request #5401 from Iceber/use-unbuffered-channel
process: use the unbuffered channel as the done signal
2021-04-21 16:50:32 -07:00
Thomas Hartland
efcb187429 Add unit tests for PID NamespaceMode_TARGET validation
Signed-off-by: Thomas Hartland <thomas.george.hartland@cern.ch>
2021-04-21 19:59:10 +02:00
Thomas Hartland
b48f27df6b Support PID NamespaceMode_TARGET
This commit adds support for the PID namespace mode TARGET
when generating a container spec.

The container that is created will be sharing its PID namespace
with the target container that was specified by ID in the namespace
options.

Signed-off-by: Thomas Hartland <thomas.george.hartland@cern.ch>
2021-04-21 17:54:17 +02:00
Iceber Gu
909660ea92 process: use the unbuffered channel as the done signal
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-04-21 18:24:18 +08:00
Michael Crosby
a3fe5c84c0
Merge pull request #5383 from wzshiming/clean/process-io
move common code to pkg/process from runtime
2021-04-20 14:40:12 -04:00
Phil Estes
81c4ac202f
Merge pull request #5256 from kolyshkin/seccomp-enabled
pkg/seccomp: simplify and speed up isEnabled
2021-04-19 15:59:28 -04:00
Shiming Zhang
7966a6652a Cleanup code
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-04-19 16:59:45 +08:00
Sebastiaan van Stijn
1c03c377e5
go.mod: github.com/containerd/fifo v1.0.0
full diff: https://github.com/containerd/fifo/compare/115abcc95a1d...v1.0.0

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-19 09:27:45 +02:00
Kir Kolyshkin
3292ea5862 pkg/seccomp: use sync.Once to speed up IsEnabled
It does not make sense to check if seccomp is supported by the kernel
more than once per runtime, so let's use sync.Once to speed it up.

A quick benchmark (old implementation, before this commit, after):

BenchmarkIsEnabledOld-4           37183            27971 ns/op
BenchmarkIsEnabled-4            1252161              947 ns/op
BenchmarkIsEnabledOnce-4      666274008             2.14 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-16 15:52:35 -07:00
Kir Kolyshkin
00b5c99b1a pkg/seccomp: simplify IsEnabled, update doc
Current implementation of seccomp.IsEnabled (rooted in runc) is not
too good.

First, it parses the whole /proc/self/status, adding each key: value
pair into the map (lots of allocations and future work for garbage
collector), when using a single key from that map.

Second, the presence of "Seccomp" key in /proc/self/status merely means
that kernel option CONFIG_SECCOMP is set, but there is a need to _also_
check for CONFIG_SECCOMP_FILTER (the code for which exists but never
executed in case /proc/self/status has Seccomp key).

Replace all this with a single call to prctl; see the long comment in
the code for details.

While at it, improve the IsEnabled documentation.

NOTE historically, parsing /proc/self/status was added after a concern
was raised in https://github.com/opencontainers/runc/pull/471 that
prctl(PR_GET_SECCOMP, ...) can result in the calling process being
killed with SIGKILL. This is a valid concern, so the new code here
does not use PR_GET_SECCOMP at all.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-04-16 15:52:35 -07:00
Phil Estes
4f18131239
Merge pull request #5286 from payall4u/optimize-cri-redirect-logs
cri: Reduce the cpu usage of  the function redirectLogs in cri
2021-04-14 21:33:05 -04:00
Sebastiaan van Stijn
864a3322b3
go.mod: github.com/containerd/go-cni v1.0.2
full diff: https://github.com/containerd/go-cni/compare/v1.0.1...v1.0.2

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-14 09:09:18 +02:00
Mike Brown
8a04bd0521 address recent runtimes config confusion
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-12 15:33:38 -05:00
Mike Brown
e96d2a5d90 Revert "remove two very old no longer used runtime options"
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-12 10:16:01 -05:00
Fu, Wei
7e3fd8da24
Merge pull request #5298 from jsturtevant/issue-5297
Support multi-arch images for Windows via ctr
2021-04-12 13:52:14 +08:00
payall4u
4bc8f692fc optimize cri redirect logs
Signed-off-by: Zhiyu Li <payall4u@qq.com>
2021-04-09 11:45:53 +08:00
Fu, Wei
d064140369
Merge pull request #5302 from mikebrow/toml-cri-defaults
shows our runc.v2 default options
2021-04-09 11:11:25 +08:00
Sebastiaan van Stijn
9bc8d63c9f
cri/server: use containerd/oci instead of libcontainer/devices
Looks like we had our own copy of the "getDevices" code already, so use
that code (which also matches the code that's used to _generate_ the spec,
so a better match).

Moving the code to a separate file, I also noticed that the _unix and _linux
code was _exactly_ the same (baring some `//nolint:` comments), so also
removing the duplicated code.

With this patch applied, we removed the dependency on the libcontainer/devices
package (leaving only libcontainer/user).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-04-08 23:25:21 +02:00
Mike Brown
dd16b006e5 merge in the move to the new options type
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-08 14:09:59 -05:00
Mike Brown
9144ce9677 shows our runc.v2 default options in the containerd default config
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-08 14:09:59 -05:00
Aditi Sharma
4d4117415e Change CRI config runtime options type
Changing Runtime.Options type to map[string]interface{}
to correctly marshal it from go to JSON.
See issue: https://github.com/kubernetes-sigs/cri-tools/issues/728

Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2021-04-08 15:11:33 +05:30
Mike Brown
88880f0f2c
Merge pull request #5304 from mikebrow/cri-registry-doc-updates
remove mirrors from default; document the deprecation of registry.configs and registry.mirrors
merging based on LGTMs from https://github.com/containerd/containerd/pull/5304#pullrequestreview-628234110 and https://github.com/containerd/containerd/pull/5304#pullrequestreview-630478887 thanks!
2021-04-07 14:49:36 -05:00
Mike Brown
d4be6aa8fa rm mirror defaults; doc registry deprecations
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-07 12:29:43 -05:00
Akihiro Suda
8ba8533bde
pkg/cri/opts.WithoutRunMount -> oci.WithoutRunMount
Move `pkg/cri/opts.WithoutRunMount` function to `oci.WithoutRunMount`
so that it can be used without dependency on CRI.

Also add `oci.WithoutMounts(dests ...string)` for generality.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-04-07 21:25:36 +09:00
Mike Brown
0186a329e9 remove two very old no longer used runtime options
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2021-04-06 20:41:09 -05:00
Derek McGowan
261c107ffc
Merge pull request #5278 from mxpv/toml
Migrate TOML to github.com/pelletier/go-toml
2021-04-01 21:24:52 -07:00
Maksym Pavlenko
5ada2f74a7 Keep host order as defined in TOML file
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-04-01 09:29:16 -07:00
James Sturtevant
d9ff8ebef5 support multi-arch images for windows via ctr
Signed-off-by: James Sturtevant <jstur@microsoft.com>
2021-03-31 15:50:01 -07:00
Mike Brown
1b05b605c8
Merge pull request #5145 from aojea/happyeyeballs
use (sort of) happy-eyeballs for port-forwarding
2021-03-26 09:51:29 -05:00
Maksym Pavlenko
ddd4298a10 Migrate current TOML code to github.com/pelletier/go-toml
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-03-25 13:13:33 -07:00
Derek McGowan
75a0c2b7d3
Merge pull request #5264 from mxpv/tests
Run unit tests on CI for MacOS
2021-03-25 09:46:25 -07:00
Fu, Wei
80fa9fe32a
Merge pull request #5135 from AkihiroSuda/default-config-crypt
add imgcrypt stream processors to the default config
2021-03-25 14:31:38 +08:00
Maksym Pavlenko
4674ad7beb Ignore some tests on darwin
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2021-03-24 22:40:22 -07:00
Maksym Pavlenko
181e2d4216
Merge pull request #5250 from dmcgowan/cri-fix-reference-ordering
Fix reference ordering in CRI image store
2021-03-23 14:45:16 -07:00
Sebastiaan van Stijn
708299ca40
Move RunningInUserNS() to its own package
This allows using the utility without bringing whole of "sys" with it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-03-23 11:29:53 +01:00
Derek McGowan
0886ceaea2
Fix reference ordering in CRI image store
Currently image references end up being stored in a
random order due to the way maps are iterated through
in Go. This leads to inconsistent identifiers being
resolved when a single reference is needed to identify
an image and the ordering of the references is used for
the selection.

Sort references in a consistent and ranked manner,
from higher information formats to lower.

Note: A `name + tag` reference is considered higher
information than a `name + digest` reference since a
registry may be used to resolve the digest from a
`name + tag` reference.

Signed-off-by: Derek McGowan <derek@mcg.dev>
2021-03-22 22:29:57 -07:00
Antonio Ojea
305b425830 use happy-eyeballs for port-forwarding
golang has enabled RFC 6555 Fast Fallback (aka HappyEyeballs)
by default in 1.12.
It means that if a host resolves to both IPv6 and IPv4,
it will try to connect to any of those addresses and use the
working connection.
However, the implementation uses go routines to start both connections in parallel,
and this has limitations when running inside a namespace, so we try to the connections
serially, trying IPv4 first for keeping the same behaviour.
xref https://github.com/golang/go/issues/44922

Signed-off-by: Antonio Ojea <aojea@redhat.com>
2021-03-22 20:15:24 +01:00
Michael Crosby
e0c94bb269
Merge pull request #4708 from kzys/enable-criu
Re-enable CRIU tests by not using overlayfs snapshotter
2021-03-19 14:23:05 -04:00
Shiming Zhang
1410220d8f Fix error log when copy file
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
2021-03-20 00:13:00 +08:00
Michael Crosby
3f98a6d2d3
Merge pull request #5211 from pacoxu/pause/3.5
upgrade pause image to 3.5 for non-root
2021-03-18 11:43:59 -04:00
Phil Estes
32a08f1a6a
Merge pull request #4847 from cpuguy83/devices_by_dir
Support adding devices by dir
2021-03-17 09:41:02 -04:00
Kazuyoshi Kato
b520428b5a Fix CRIU
- process.Init#io could be nil
- Make sure CreateTaskRequest#Options is not empty before unmarshaling

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
2021-03-16 16:46:45 -07:00
pacoxu
ffff688663 upgrade pause image to 3.5 for non-root
Signed-off-by: pacoxu <paco.xu@daocloud.io>
2021-03-16 23:20:35 +08:00