Provides a couple helper functions that provide a background context for
running cleanup jobs while preserving the original context values.
The new contexts will not inherit the errors or cancellations.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Comments in initPlatform for Windows states that the options were
Linux specific. Additionally properly wrap an error after trying
to setup CDI on Linux.
Signed-off-by: Danny Canter <danny@dcantah.dev>
The sandbox and container both have the userns config. Lets make sure
they are the same, therefore consistent.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Currently we require that c.containerSpec() does not return an error
if test.err is not set.
However, if the require fails (i.e. it indeed returned an error) the
rest of the code is executed anyways. The rest of the code assumes it
did not return an error (so code assumes spec is not nil). This fails
miserably if it indeed returned an error, as spec is nil and go crashes
while running the unit tests.
Let's require it is not an error, so code does not continue to execute
if that fails and go doesn't crash.
In the test.err case is not harmful the bug of using assert, but let's
switch it to require too as that is what we really want.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
A domainname field was recently added to the OCI spec. Prior to this
folks would need to set this with a sysctl, but now runtimes should be
able to setdomainname(2). There's an open change to runc at the moment
to add support for this so I've just left testing as a couple spec
validations in CRI until that's in and usable.
Signed-off-by: Danny Canter <danny@dcantah.dev>
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This allows user namespace support to progress, either by allowing
snapshotters to deal with ownership, or falling back to containerd doing
a recursive chown.
In the future, when snapshotters implement idmap mounts, they should
report the "remap-ids" capability.
Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: David Leadbeater <dgl@dgl.cx>
As explained in the comments, this patch lets the OCI runtime create the
netns when userns are in use. This is needed because the netns needs to
be owned by the userns (otherwise can't modify the IP, etc.).
Before this patch, we are creating the netns and then starting the pod
sandbox asking to join this netns. This can't never work with userns, as
the userns needs to be created first for the netns ownership to be
correct.
One option would be to also create the userns in containerd, then create
the netns. But this is painful (needs tricks with the go runtime,
special care to write the mapping, etc.).
So, we just let the OCI runtime create the userns and netns, that
creates them with the proper ownership.
As requested by Mike Brown, the current code when userns is not used is
left unchanged. We can unify the cases (with and without userns) in a
future release.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Due to when we were updating the pod sandboxes underlying container
object, the pointer to the sandbox would have the right info, but
the on-disk representation of the data was behind. This would cause
the data returned from loading any sandboxes after a restart to have
no CNI result or IP information for the pod.
This change does an additional update to the on-disk container info
right after we invoke the CNI plugin so the metadata for the CNI result
and other networking information is properly flushed to disk.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Skip automatic `if swapLimit == 0 { s.Linux.Resources.Memory.Swap = &limit }` when the swap controller is missing.
(default on Ubuntu 20.04)
Fix issue 7828 (regression in PR 7783 "cri: make swapping disabled with memory limit")
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
While we need to support CRI v1alpha2, the implementation doesn't have
to be tied to gogo/protobuf.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
We do a ton of host networking checks around the CRI plugin, all mainly
doing the same thing of checking the different quirks on various platforms
(for windows are we a HostProcess pod, for linux is namespace mode the
right thing, darwin doesn't have CNI support etc.) which could all be
bundled up into a small helper that can be re-used.
Signed-off-by: Danny Canter <danny@dcantah.dev>
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
Adds a service capable of streaming Any objects bi-directionally.
This can be used by services to send data, received data, or to
initiate requests from server to client.
Signed-off-by: Derek McGowan <derek@mcg.dev>
In the CRI streaming server, a goroutine (`handleResizeEvents`) is launched
to handle terminal resize events if a TTY is asked for with an exec; this
is the sender of terminal resize events. Another goroutine is launched
shortly after successful process startup to actually do something with
these events, however the issue arises if the exec process fails to start
for any reason that would have `process.Start` return non-nil. The receiver
goroutine never gets launched so the sender is stuck blocked on a channel send
infinitely.
This could be used in a malicious manner by repeatedly launching execs
with a command that doesn't exist in the image, as a single goroutine
will get leaked on every invocation which will slowly grow containerd's
memory usage.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Remove direct invocation of old v0.1.0 NRI plugins. They
can be enabled using the revised NRI API and the v0.1.0
adapter plugin.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Implement the adaptation interface required by the NRI
service plugin to handle CRI sandboxes and containers.
Hook the NRI service plugin into CRI request processing.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.
The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
All pause container object references must be removed
from sbserver. This is an implementation detail of
podsandbox package.
Added TODOs for remaining work.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
In golang when copy a slice, if the slice is initialized with a
desired length, then appending to it will cause the size double.
Signed-off-by: bin liu <liubin0329@gmail.com>
Add a new config as sandbox controller mod, which can be either
"podsandbox" or "shim". If empty, set it to default "podsandbox"
when CRI plugin inits.
Signed-off-by: Zhang Tianyang <burning9699@gmail.com>
Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:
```go
func BenchmarkSplit(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_ = strings.SplitN(s, "=", 2)[0]
}
}
}
func BenchmarkCut(b *testing.B) {
b.ReportAllocs()
data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
for i := 0; i < b.N; i++ {
for _, s := range data {
_, _, _ = strings.Cut(s, "=")
}
}
}
```
BenchmarkSplit
BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op
BenchmarkCut
BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op
While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints; for cases where an specific number of items is expected, I used `strings.SplitN()`
with a suitable limit. This prevents (theoretical) unlimited splits.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Swagat Bora <sbora@amazon.com>
Add spans around image unpack operations
Use image.ref to denote image name and image.id for the image config digest
Add top-level spand and record errors in the CRI instrumentation service
Old TODO stating that pkg/cri/opts's `WithWindowsNetworkNamespace`
should be moved to the main containerd pkg was out of date as thats
already been done (well, to the /oci package). This just removes it
and swaps all uses of `WithWindowsNetworkNamespace` to the oci
packages impl.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Remove nolint-comments that weren't hit by linters, and remove the "structcheck"
and "varcheck" linters, as they have been deprecated:
WARN [runner] The linter 'structcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [runner] The linter 'varcheck' is deprecated (since v1.49.0) due to: The owner seems to have abandoned the linter. Replaced by unused.
WARN [linters context] structcheck is disabled because of generics. You can track the evolution of the generics support by following the https://github.com/golangci/golangci-lint/issues/2649.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- fix "nolint" comments to be in the correct format (`//nolint:<linters>[,<linter>`
no leading space, required colon (`:`) and linters.
- remove "nolint" comments for errcheck, which is disabled in our config.
- remove "nolint" comments that were no longer needed (nolintlint).
- where known, add a comment describing why a "nolint" was applied.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This `//nolint` was added in f5c7ac9272
to suppress warnings about the `NameToCertificate` function being deprecated:
// Deprecated: NameToCertificate only allows associating a single certificate
// with a given name. Leave that field nil to let the library select the first
// compatible chain from Certificates.
Looking at that, it was deprecated in Go 1.14 through
eb93c684d4
(https://go-review.googlesource.com/c/go/+/205059), which describes:
crypto/tls: select only compatible chains from Certificates
Now that we have a full implementation of the logic to check certificate
compatibility, we can let applications just list multiple chains in
Certificates (for example, an RSA and an ECDSA one) and choose the most
appropriate automatically.
NameToCertificate only maps each name to one chain, so simply deprecate
it, and while at it simplify its implementation by not stripping
trailing dots from the SNI (which is specified not to have any, see RFC
6066, Section 3) and by not supporting multi-level wildcards, which are
not a thing in the WebPKI (and in crypto/x509).
We should at least have a comment describing why we are ignoring this, but preferably
review whether we should still use it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Currently CDI registry is reconfigured on every
WithCDI call, which is a relatively heavy operation.
This happens because cdi.GetRegistry(cdi.WithSpecDirs(cdiSpecDirs...))
unconditionally reconfigures the registry (clears fs notify watch,
sets up new watch, rescans directories).
Moving configuration to the criService.initPlatform should result
in performing registry configuration only once on the service start.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
As WithCDI is CRI-only API it makes sense to move it
out of oci module.
This move can also fix possible issues with this API when
CRI plugin is disabled.
Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
This makes diff archives to be reproducible.
The value is expected to be passed from CLI applications via the $SOUCE_DATE_EPOCH env var.
See https://reproducible-builds.org/docs/source-date-epoch/
for the $SOURCE_DATE_EPOCH specification.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Rework sandbox monitoring, we should rely on Controller.Wait instead of
CRIService.StartSandboxExitMonitor
Signed-off-by: WangLei <wllenyj@linux.alibaba.com>
pkg/cri/sbserver/cri_fuzzer.go and pkg/cri/server/cri_fuzzer.go were
mostly the same.
This commit merges them together and move the unified fuzzer to
contrib/fuzz again to sort out dependencies. pkg/cri/ shouldn't consume
cmd/.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
As part of the effort of getting hypervisor isolated windows container
support working for the CRI entrypoint here, add the runhcs-wcow-hypervisor
handler for the default config. This sets the correct SandboxIsolation
value that the Windows shim uses to differentiate process vs. hypervisor
isolation. This change additionally sets the wcow-process runtime to
passthrough io.microsoft.container* annotations and the hypervisor runtime
to accept io.microsoft.virtualmachine* annotations.
Note that for K8s users this runtime handler will need to be configured by
creating the corresponding RuntimeClass resources on the cluster as it's
not the default runtime.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
It is follow-up of #7254. This commit will increase ReadHeaderTimeout
from 3s to 30m, which prevent from unexpected timeout when the node is
running with high-load. 30 Minutes is longer enough to get close to
before what #7254 changes.
And ideally, we should allow user to configure the streaming server if
the users want this feature.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
`ioutil` has been deprecated by golang. All the code in `ioutil` just
forwards functionality to code in either the `io` or `os` packages.
See https://github.com/golang/go/pull/51961 for more info.
Signed-off-by: Jeff Widman <jeff@jeffwidman.com>
Failpoint is used to control the fail during API call when testing, especially
the API is complicated like CRI-RunPodSandbox. It can help us to test
the unexpected behavior without mock. The control design is based on freebsd
fail(9), but simpler.
REF: https://www.freebsd.org/cgi/man.cgi?query=fail&sektion=9&apropos=0&manpath=FreeBSD%2B10.0-RELEASE
Signed-off-by: Wei Fu <fuweid89@gmail.com>
All of the CRI store related packages all use the standard errdefs
errors now for if a key doesn't or already exists (ErrAlreadyExists,
ErrNotFound), but the comments for the methods still referenced
some unused package specific error definitions. This change just
updates the comments to reflect what errors are actually returned
and adds comments for some previously undocumented exported functions.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>
All containers except the pause container, mount `/dev/shm" with flags
`nosuid,nodev,noexec`. So change mount options for pause container to
keep consistence.
This also helps to solve issues of failing to mount `/dev/shm` when
pod/container level user namespace is enabled.
Fixes: #6911
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
This's an optimization to get rid of redundant `/dev/shm" mounts for pause container.
In `oci.defaultMounts`, there is a default `/dev/shm` mount which is redundant for
pause container.
Fixes: #6911
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
The TestPodAnnotationPassthroughContainerSpec test and the
TestContainerAnnotationPassthroughContainerSpec test both depend on a
platform-specific implementation of criService.containerSpec, which is
unimplemented on FreeBSD.
The TestSandboxContainerSpec depends on a platform-specific
implementation oc criService.sandboxContainerSpec, which is
unimplemented on FreeBSD.
Signed-off-by: Samuel Karp <me@samuelkarp.com>