If kubelet passes the swap limit (default memory limit = swap limit ),
it is configured for container irrespective if the node supports swap.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
These are not actually being pulled, just removing the deprecated k8s.gcr.io
from the code-base. While at it, also renamed / removed vars that shadowed
with package-level definitions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When the pods are transitioning there are several
cases where containers might not be in valid state.
There were several cases where the stats where
failing hard but we should just continue on as
they are transient and will be picked up again
when kubelet queries for the stats again.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
This commit just updates the sbserver with the same fix we did on main:
9bf5aeca77 ("cri: Fix net.ipv4.ping_group_range with userns ")
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This is a port of 31a6449734 ("Add capability for snapshotters to
declare support for UID remapping") to sbserver.
This patch remaps the rootfs in the platform-specific if user namespaces
are in use, so the pod can read/write to the rootfs.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.
This is an adaptation of a7adeb6976 ("cri: Support pods with user
namespaces") to sbserver, although the container_create.go parts were
already ported as part of 40be96efa9 ("Have separate spec builder for
each platform"),
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This commit just ports 36f520dc04 ("Let OCI runtime create netns when
userns is used") to sbserver.
The CNI network setup is done after OCI start, as it didn't seem simple
to get the sandbox PID we need for the netns otherwise.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Currently there is a big c&p of the helpers between these two folders
and a TODO in the platform agnostic file to organize them in the future,
when some other things settle.
So, let's just copy them for now.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Commit c085fac1e5 ("Move sandbox start behind controller") moved the
runtimeStart to only account for time _after_ the netns has been
created.
To match what we currently do in cri/server, let's move it to just after
the get the sandbox runtime.
This come up when porting userns to sbserver, as the CNI network setup
needs to be done at a later stage and runtimeStart was accounting for
the CNI network setup time only when userns is enabled.
To avoid that discrepancy, let's just move it earlier, that also matches
what we do in cri/server.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Beside the "in future the when" typo, we take the chance to reflect that
user namespaces are already merged.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
These two errors can occur in the following scenarios:
ECONNRESET: the target process reset connection between CRI and itself.
see: #111825 for detail
EPIPE: the target process did not read the received data, causing the
buffer in the kernel to be full, resulting in the occurrence of Zero Window,
then closing the connection (FIN, RESET)
see: #74551 for detail
In both cases, we should RESET the httpStream.
Signed-off-by: wangxiang <scottwangsxll@gmail.com>
userns.RunningInUserNS() checks if the code calling that function is
running inside a user namespace. But we need to check if the container
we will create will use a user namespace, in that case we need to
disable the sysctl too (or we would need to take the userns mapping into
account to set the IDs).
This was added in PR:
https://github.com/containerd/containerd/pull/6170/
And the param documentation says it is not enabled when user namespaces
are in use:
https://github.com/containerd/containerd/pull/6170/files#diff-91d0a4c61f6d3523b5a19717d1b40b5fffd7e392d8fe22aed7c905fe195b8902R118
I'm not sure if the intention was to disable this if containerd is
running inside a userns (rootless, if that is even supported) or just
when the pod has user namespaces.
Out of an abundance of caution, I'm keeping the userns.RunningInUserNS()
so it is still not used if containerd runs inside a user namespace.
With this patch and "enable_unprivileged_icmp = true" in the config,
running containerd as root on the host, pods with user namespaces start
just fine. Without this patch they fail with:
... failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: w
/proc/sys/net/ipv4/ping_group_range: invalid argument: unknown
Thanks a lot to Andy on the k8s slack for reporting the issue. He also
mentions he hits this with k3s on a default installation (the param
is off by default on containerd, but k3s turns that on by default it
seems). He also debugged which part of the stack was setting that
sysctl, found the PR that added this code in containerd and a workaround
(to turn the bool off).
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Helpers to convert from a slice of platforms to our protobuf representation
and vice-versa appear a couple times. It seems sane to just expose this facility
in the platforms pkg.
Signed-off-by: Danny Canter <danny@dcantah.dev>
This introduces a ParseSourceDateEpoch function, which can be used
to parse "SOURCE_DATE_EPOCH" values for situations where those
values are not passed through an env-var (or the env-var has been
read through other means).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These tests were failing on my macOS; could be the precision issue (like on
Windows), or just because they're "too fast".
=== RUN TestSourceDateEpoch/WithoutSourceDateEpoch
epoch_test.go:51:
Error Trace: /Users/thajeztah/go/src/github.com/containerd/containerd/pkg/epoch/epoch_test.go:51
Error: Should be true
Test: TestSourceDateEpoch/WithoutSourceDateEpoch
Messages: now: 2023-06-23 11:47:09.93118 +0000 UTC, v: 2023-06-23 11:47:09.93118 +0000 UTC
This patch:
- updates the rightAfter utility to allow the timestamps to be "equal"
- updates the asserts to provide some details about the timestamps
- uses UTC for the value we're comparing to, to match the timestamps
that are generated.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
As a follow up change to adding a SandboxMetrics rpc to the core
sandbox service, the controller needed a corresponding rpc for CRI
and others to eventually implement.
This leaves the CRI (non-shim mode) controller unimplemented just to
have a change with the API addition to start.
Signed-off-by: Danny Canter <danny@dcantah.dev>
When a container is just created, exited state the container will not have stats. A common case for this in k8s is the init containers for a pod. The will be present in the listed containers but will not have a running task and there for no stats.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
The 10-containerd-net.conflist file generated from the conf_template
should be written atomically so that partial writes are not visible to
CNI plugins. Use the new consistentfile package to ensure this on
Unix-like platforms such as Linux, FreeBSD, and Darwin.
Fixes https://github.com/containerd/containerd/issues/8607
Signed-off-by: Samuel Karp <samuelkarp@google.com>
Certain files may need to be written atomically so that partial writes
are not visible to other processes. On Unix-like platforms such as
Linux, FreeBSD, and Darwin, this is accomplished by writing a temporary
file, syncing, and renaming over the destination file name. On Windows,
the same operations are performed, but Windows does not guarantee that a
rename operation is atomic.
Partial/inconsistent reads can occur due to:
1. A process attempting to read the file while containerd is writing it
(both in the case of a new file with a short/incomplete write or in
the case of an existing, updated file where new bytes may be written
at the beginning but old bytes may still be present after).
2. Concurrent goroutines in containerd leading to multiple active
writers of the same file.
The above mechanism explicitly protects against (1) as all writes are to
a file with a temporary name.
There is no explicit protection against multiple, concurrent goroutines
attempting to write the same file. However, atomically writing the file
should mean only one writer will "win" and a consistent file will be
visible.
Signed-off-by: Samuel Karp <samuelkarp@google.com>
The initial PR had a check for nil metrics but after some refactoring in the PR the test case that was suppose cover HPC was missing a scenario where the metric was not nil but didn't contain any metrics. This fixes that case and adds a testcase to cover it.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.
Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Several bits of code unmarshal image config JSON into an `ocispec.Image`, and then immediately create an `ocispec.Platform` out of it, but then discard the original image *and* miss several potential platform fields (most notably, `variant`).
Because `ocispec.Platform` is a strict subset of `ocispec.Image`, most of these can be updated to simply unmarshal the image config directly to `ocispec.Platform` instead, which allows these additional fields to be picked up appropriately.
We can use `tianon/raspbian` as a concrete reproducer to demonstrate.
Before:
```console
$ ctr content fetch docker.io/tianon/raspbian:bullseye-slim
...
$ ctr image ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/tianon/raspbian:bullseye-slim application/vnd.docker.distribution.manifest.v2+json sha256:66e96f8af40691b335acc54e5f69711584ef7f926597b339e7d12ab90cc394ce 28.6 MiB linux/arm/v7 -
```
(Note that the `PLATFORMS` column lists `linux/arm/v7` -- the image itself is actually `linux/arm/v6`, but one of these bits of code leads to only `linux/arm` being extracted from the image config, which `platforms.Normalize` then updates to an explicit `v7`.)
After:
```console
$ ctr image ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/tianon/raspbian:bullseye-slim application/vnd.docker.distribution.manifest.v2+json sha256:66e96f8af40691b335acc54e5f69711584ef7f926597b339e7d12ab90cc394ce 28.6 MiB linux/arm/v6 -
```
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Windows systems are capable of running both Windows Containers and Linux
containers. For windows containers we need to sanitize the volume path
and skip non-C volumes from the copy existing contents code path. Linux
containers running on Windows and Linux must not have the path sanitized
in any way.
Supplying the targetOS of the container allows us to proprely decide
when to activate that code path.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Images may be created with a VOLUME stanza pointed to drive letters that
are not C:. Currently, an image that has such VOLUMEs defined, will
cause containerd to error out when starting a container.
This change skips copying existing contents to volumes that are not C:.
as an image can only hold files that are destined for the C: drive of a
container.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
To further some ongoing work in containerd to make as much code as possible
able to be used on any platform (to handle runtimes that can virtualize/emulate
a variety of different OSes), this change makes stats able to be handled on
any of the supported stat types (just linux and windows). To accomplish this,
we use the platform the sandbox returns from its `Platform` rpc to decide
what format the containers in a given sandbox are returning metrics in, then
we can typecast/marshal accordingly.
Signed-off-by: Danny Canter <danny@dcantah.dev>
The oci.WithUser option was being applied in container_create_linux.go
instead of the cross plat buildLinuxSpec method. There's been recent
work to try and make every spec option that can be applied on any platform
able to do so, and this falls under that. However, WithUser on linux platforms
relies on the containers SnapshotKey being filled out, which means the spec
option needs to be applied during container creation.
To make this a little more generic, I've created a new platformSpecOpts
method that handles any spec opts that rely on runtime state (rootfs mounted
for example) for some platforms, or just platform options that we still don't
have workarounds for to be able to specify them for other platforms
(apparmor, seccomp etc.) by internally calling the already existing
containerSpecOpts method.
Signed-off-by: Danny Canter <danny@dcantah.dev>
There was a couple uses of Readdir/ReadDir here where the only thing the return
value was used for was the Name of the entry. This is exactly what Readdirnames
returns, so we can avoid the overhead of making/returning a bunch of interfaces
and calling lstat everytime in the case of Readdir(-1).
https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/os/dir_unix.go;l=114-137
Signed-off-by: Danny Canter <danny@dcantah.dev>
This pointer to an issue never got updated after the CRI plugin was
absorbed into the main containerd repo as an in-tree plugin.
Signed-off-by: Samuel Karp <samuelkarp@google.com>
If containerd does not see a container but criservice's
container store does, then we should try to recover from
this error state by removing the container from criservice's
container store as well.
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
Using array to build sub-tests is to avoid random pick. The shuffle
thing should be handled by go-test framework. And we should capture
range var before runing sub-test.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Using array to build sub-tests is to avoid random pick. The shuffle
thing should be handled by go-test framework. And we should capture
range var before runing sub-test.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
We need support in containerd and the OCI runtime to use idmap mounts.
Let's just throw an error for now if the kubelet requests some mounts
with mappings.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Currently if you're using the shim-mode sandbox server support, if your
shim that's hosting the Sandbox API dies for any reason that wasn't
intentional (segfault, oom etc.) PodSandboxStatus is kind of wedged.
We can use the fact that if we didn't go through the usual k8s flow
of Stop->Remove and we still have an entry in our sandbox store,
us not having a shim mapping anymore means this was likely unintentional.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Using symlinks for bind mounts means we are not protecting an RO-mounted
layer against modification. Windows doesn't currently appear to offer a
better approach though, as we cannot create arbitrary empty WCOW scratch
layers at this time.
For windows-layer mounts, Unmount does not have access to the mounts
used to create it. So we store the relevant data in an Alternate Data
Stream on the mountpoint in order to be able to Unmount later.
Based on approach in https://github.com/containerd/containerd/pull/2366,
with sign-offs recorded as 'Based-on-work-by' trailers below.
This also partially-reverts some changes made in #6034 as they are not
needed with this mounting implmentation, which no longer needs to be
handled specially by the caller compared to non-Windows mounts.
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Based-on-work-by: Michael Crosby <crosbymichael@gmail.com>
Based-on-work-by: Darren Stahl <darst@microsoft.com>
This error message currently does not provide useful information, because the `src` value that is interleaved will have been overridden by the call to `osi.ResolveSymbolicLink`. This stores the original `src` before the `osi.ResolveSymbolicLink` call so the error message can be useful.
Signed-off-by: June Rhodes <504826+hach-que@users.noreply.github.com>
In the CRI server initialization a syncgroup is setup that adds to the
counter for every cni config found/registered. This functions on platforms
where CNI is supported/theres an assumption that there will always be
the loopback config. However, on platforms like Darwin where there's generally
nothing registered the Wait() on the syncgroup returns immediately and the
channel used to return any Network config sync errors is closed. This channel
is one of three that's used to monitor if we should Close the CRI service in
containerd, so it's not great if this happens.
Signed-off-by: Danny Canter <danny@dcantah.dev>
These deprecations were mentioned in `pkg/cri/config/config.go`
but not mentioned in `RELEASES.md`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This commit adds supports for the ArgsEscaped
value for the image got from the dockerfile.
It is used to evaluate and process the image
entrypoint/cmd and container entrypoint/cmd
options got from the podspec.
Signed-off-by: Kirtana Ashok <Kirtana.Ashok@microsoft.com>
1. it's easy to check wrong input if using drain_exec_sync_io_timeout in error
2. avoid to use full error message, as part of error generated by go
stdlib would be changed in the future
3. delete the extra empty line
Signed-off-by: Wei Fu <fuweid89@gmail.com>
By default, the child processes spawned by exec process will inherit standard
io file descriptors. The shim server creates a pipe as data channel. Both exec
process and its children write data into the write end of the pipe. And the
shim server will read data from the pipe. If the write end is still open, the
shim server will continue to wait for data from pipe.
So, if the exec command is like `bash -c "sleep 365d &"`, the exec process is
bash and quit after create `sleep 365d`. But the `sleep 365d` will hold the
write end of the pipe for a year! It doesn't make senses that CRI plugin
should wait for it.
For this case, we should use timeout to drain exec process's io instead of
waiting for it.
Fixes: #7802
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Update NRI plugin configuration to match that of NRI. Remove
option for the eliminated NRI configuration file. Add option
to disable connections from externally launched plugins. Add
options to override default plugin registration and request
timeouts.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
The snapshotter annotation definitions and related functions have been
public in the new packge snapshotter
Also remove a test for container image layer's annotation.
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
From golangci-lint:
> SA1019: rand.Read has been deprecated since Go 1.20 because it
>shouldn't be used: For almost all use cases, crypto/rand.Read is more
>appropriate. (staticcheck)
> SA1019: rand.Seed has been deprecated since Go 1.20 and an alternative
>has been available since Go 1.0: Programs that call Seed and then expect
>a specific sequence of results from the global random source (using
>functions such as Int) can be broken when a dependency changes how
>much it consumes from the global random source. To avoid such breakages,
>programs that need a specific result sequence should use
>NewRand(NewSource(seed)) to obtain a random generator that other
>packages cannot access. (staticcheck)
See also:
- https://pkg.go.dev/math/rand@go1.20#Read
- https://pkg.go.dev/math/rand@go1.20#Seed
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
There is a new CNI capability argument, cgroupPath, where runtimes can
pass cgroup paths to CNI plugins.
Implement that.
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
All of the CRI sandbox and container specs all get assigned
almost the exact same default annotations (sandboxID, name, metadata,
container type etc.) so lets make a helper to return the right set for
a sandbox or regular workload container.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Split out the criService-agnostic bits of nri-api* from
pkg/cri/server to pkg/cri/nri to allow sharing a single
implementation betwen the server and sbserver versions.
Rework the interfaces to not require access to package
internals.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
In https://github.com/containerd/containerd/pull/7764 it was made
so that generic runtime options in the containerd toml config file
would get passed to shims regardless of if containerd knew of the
type beforehand and could supply the struct. However, this was only
added for the sandbox server fork here and not the regular ol' CRI
server. This change just mirrors the parts that need to be plopped in
pkg/cri/server
Signed-off-by: Danny Canter <danny@dcantah.dev>
Before this patch, both the RdtEnabled and BlockIOEnabled are provided
by services/tasks pkg. Since the services/tasks can be pkg plugin which
can be initialized multiple times or concurrently. It will fire data-race
issue as there is no mutex to protect `enable`.
This patch is aimed to provide wrapper pkgs to use intel/{blockio,rdt}
safely.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
For determinism of human-readable string representation.
e.g., "2023-01-10T12:34:56Z" vs "2023-01-10T21:34:56+09:00"
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
/etc/cni has to be readable for non-root users (0755), because /etc/cni/tuning/allowlist.conf is used for rootless mode too.
This file was introduced in CNI plugins 1.2.0 (containernetworking/plugins PR 693), and its path is hard-coded.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
- Add Target to mount.Mount.
- Add UnmountMounts to unmount a list of mounts in reverse order.
- Add UnmountRecursive to unmount deepest mount first for a given target, using
moby/sys/mountinfo.
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
Some of this code was originally added in b7b1200dd3,
which likely meant to initialize the slice with a length to reduce allocations,
however, instead of initializing with a zero-length and a capacity, it
initialized the slice with a fixed length, which was corrected in commit
0c63c42f81.
This patch initializes the slice with a zero-length and expected capacity.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Provides a couple helper functions that provide a background context for
running cleanup jobs while preserving the original context values.
The new contexts will not inherit the errors or cancellations.
Signed-off-by: Derek McGowan <derek@mcg.dev>
Comments in initPlatform for Windows states that the options were
Linux specific. Additionally properly wrap an error after trying
to setup CDI on Linux.
Signed-off-by: Danny Canter <danny@dcantah.dev>
The sandbox and container both have the userns config. Lets make sure
they are the same, therefore consistent.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Currently we require that c.containerSpec() does not return an error
if test.err is not set.
However, if the require fails (i.e. it indeed returned an error) the
rest of the code is executed anyways. The rest of the code assumes it
did not return an error (so code assumes spec is not nil). This fails
miserably if it indeed returned an error, as spec is nil and go crashes
while running the unit tests.
Let's require it is not an error, so code does not continue to execute
if that fails and go doesn't crash.
In the test.err case is not harmful the bug of using assert, but let's
switch it to require too as that is what we really want.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
A domainname field was recently added to the OCI spec. Prior to this
folks would need to set this with a sysctl, but now runtimes should be
able to setdomainname(2). There's an open change to runc at the moment
to add support for this so I've just left testing as a couple spec
validations in CRI until that's in and usable.
Signed-off-by: Danny Canter <danny@dcantah.dev>
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This allows user namespace support to progress, either by allowing
snapshotters to deal with ownership, or falling back to containerd doing
a recursive chown.
In the future, when snapshotters implement idmap mounts, they should
report the "remap-ids" capability.
Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: David Leadbeater <dgl@dgl.cx>
As explained in the comments, this patch lets the OCI runtime create the
netns when userns are in use. This is needed because the netns needs to
be owned by the userns (otherwise can't modify the IP, etc.).
Before this patch, we are creating the netns and then starting the pod
sandbox asking to join this netns. This can't never work with userns, as
the userns needs to be created first for the netns ownership to be
correct.
One option would be to also create the userns in containerd, then create
the netns. But this is painful (needs tricks with the go runtime,
special care to write the mapping, etc.).
So, we just let the OCI runtime create the userns and netns, that
creates them with the proper ownership.
As requested by Mike Brown, the current code when userns is not used is
left unchanged. We can unify the cases (with and without userns) in a
future release.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Due to when we were updating the pod sandboxes underlying container
object, the pointer to the sandbox would have the right info, but
the on-disk representation of the data was behind. This would cause
the data returned from loading any sandboxes after a restart to have
no CNI result or IP information for the pod.
This change does an additional update to the on-disk container info
right after we invoke the CNI plugin so the metadata for the CNI result
and other networking information is properly flushed to disk.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Skip automatic `if swapLimit == 0 { s.Linux.Resources.Memory.Swap = &limit }` when the swap controller is missing.
(default on Ubuntu 20.04)
Fix issue 7828 (regression in PR 7783 "cri: make swapping disabled with memory limit")
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
While we need to support CRI v1alpha2, the implementation doesn't have
to be tied to gogo/protobuf.
Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
We do a ton of host networking checks around the CRI plugin, all mainly
doing the same thing of checking the different quirks on various platforms
(for windows are we a HostProcess pod, for linux is namespace mode the
right thing, darwin doesn't have CNI support etc.) which could all be
bundled up into a small helper that can be re-used.
Signed-off-by: Danny Canter <danny@dcantah.dev>
OCI runtime spec defines memory.swap as 'limit of memory+Swap usage'
so setting them to equal should disable the swap. Also, this change
should make containerd behaviour same as other runtimes e.g
'cri-dockerd/dockershim' and won't be impacted when user turn on
'NodeSwap' (https://github.com/kubernetes/enhancements/issues/2400) feature.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
Adds a service capable of streaming Any objects bi-directionally.
This can be used by services to send data, received data, or to
initiate requests from server to client.
Signed-off-by: Derek McGowan <derek@mcg.dev>
In the CRI streaming server, a goroutine (`handleResizeEvents`) is launched
to handle terminal resize events if a TTY is asked for with an exec; this
is the sender of terminal resize events. Another goroutine is launched
shortly after successful process startup to actually do something with
these events, however the issue arises if the exec process fails to start
for any reason that would have `process.Start` return non-nil. The receiver
goroutine never gets launched so the sender is stuck blocked on a channel send
infinitely.
This could be used in a malicious manner by repeatedly launching execs
with a command that doesn't exist in the image, as a single goroutine
will get leaked on every invocation which will slowly grow containerd's
memory usage.
Signed-off-by: Danny Canter <danny@dcantah.dev>
Remove direct invocation of old v0.1.0 NRI plugins. They
can be enabled using the revised NRI API and the v0.1.0
adapter plugin.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Implement the adaptation interface required by the NRI
service plugin to handle CRI sandboxes and containers.
Hook the NRI service plugin into CRI request processing.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add a common NRI 'service' plugin. It takes care of relaying
requests and respones to and from NRI (external NRI plugins)
and the high-level containerd namespace-independent logic of
applying NRI container adjustments and updates to actual CRI
and other containers.
The namespace-dependent details of the necessary container
manipulation operations are to be implemented by namespace-
specific adaptations. This NRI plugin defines the API which
such adaptations need to implement.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
All pause container object references must be removed
from sbserver. This is an implementation detail of
podsandbox package.
Added TODOs for remaining work.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
In golang when copy a slice, if the slice is initialized with a
desired length, then appending to it will cause the size double.
Signed-off-by: bin liu <liubin0329@gmail.com>