Adds support to mount named pipes into Windows containers. This support
already exists in hcsshim, so this change just passes them through
correctly in cri. Named pipe mounts must start with "\\.\pipe\".
Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Reason: originally it was introduced to prevent the loading of the SCTP kernel module on the nodes. But iptables chain creation alone does not load the kernel module. The module would be loaded if an SCTP socket was created, but neither cri nor the portmap CNI plugin starts managing SCTP sockets if hostPort / portmappings are defined.
Signed-off-by: Laszlo Janosi <laszlo.janosi@ibm.com>
This adds a configuration knob for adding request headers to all
registry requests. It is not namespaced to a registry.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
How to test (from https://github.com/opencontainers/runc/pull/2352#issuecomment-620834524):
(host)$ sudo swapoff -a
(host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
(container)$ sh -c 'VAR=$(seq 1 100000000)'
An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This moves most of the API calls off of the `labels` package onto the root
selinux package. This is the newer API for most selinux operations.
Signed-off-by: Michael Crosby <michael@thepasture.io>
We encountered two failing end-to-end tests after the adoption of
https://github.com/containerd/cri/pull/1470 in
https://github.com/cri-o/cri-o/pull/3749:
```
Summarizing 2 Failures:
[Fail] [sig-cli] Kubectl Port forwarding With a server listening on 0.0.0.0 that expects a client request [It] should support a client that connects,
sends DATA, and disconnects
test/e2e/kubectl/portforward.go:343
[Fail] [sig-cli] Kubectl Port forwarding With a server listening on localhost that expects a client request [It] should support a client that connects
, sends DATA, and disconnects
test/e2e/kubectl/portforward.go:343
```
Increasing the timeout to 1s fixes the issue.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
This swaps the RunningInUserNS() function that we're using
from libcontainer/system with the one in containerd/sys.
This removes the dependency on libcontainer/system, given
these were the only functions we're using from that package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `DualStack` option was deprecated in Go 1.12, and is now enabled by default
(through commit github.com/golang/go@efc185029bf770894defe63cec2c72a4c84b2ee9).
> The Dialer.DualStack field is now meaningless and documented as deprecated.
>
> To disable fallback, set FallbackDelay to a negative value.
The default `FallbackDelay` is 300ms; to make this more explicit, this patch
sets `FallbackDelay` to the default value.
Note that Docker Hub currently does not support IPv6 (DNS for registry-1.docker.io
has no AAAA records, so we should not hit the 300ms delay).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
use goroutines to copy the data from the stream to the TCP
connection, and viceversa, removing the socat dependency.
Quoting Lantao Liu, the logic is as follow:
When one side (either pod side or user side) of portforward
is closed, we should stop port forwarding.
When one side is closed, the io.Copy use that side as source will close,
but the io.Copy use that side as dest won't.
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
This changes adds `default_seccomp_profile` config switch to apply default seccomp profile when not provided by k8s.a
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Should destroy the pod network if fails to setup or return invalid
net interface, especially multiple CNI configurations.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Currently, CRI plugin passes each layer digest to remote snapshotters
sequentially, which leads to sequential snapshots preparation. But it costs
extra time especially for remote snapshotters which need to connect to the
remote backend store (e.g. registries) for checking the snapshot existence on
each preparation.
This commit solves this problem by introducing new label
`containerd.io/snapshot/cri.chain` for passing all layer digests in an image to
snapshotters and by allowing them to prepare these snapshots in parallel, which
leads to speed up the preparation.
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
For a privilege pods with PrivilegedWithoutHostDevices set to true
host device specified in the config are not provided (whereas it is done for
non privilege pods or privilege pods with PrivilegedWithoutHostDevices set
to false as all devices are included).
Add them in this case.
Fixes: 3353ab76d9 ("Add flag to overload default privileged host device behaviour")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
containerd loads timeout values from config.toml and populated those
values to `timeout` package at launch. So when using `timeout` package
from shim, there are default values and config file is ignored.
So use a hardcoded value for binary IO.
Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Throughout container lifecycle, pulling image is one of the time-consuming
steps. Recently, containerd community started to tackle this issue with
stargz-based remote snapshots, as a non-core
subproject(https://github.com/containerd/stargz-snapshotter).
This snapshotter is implemented as a standard proxy plugin but it requires the
client to pass some additional information (image ref and layer digest) for each
pull operation to query layer contents on the registry. Stargz snapshotter
project provides an image handler to do this and stargz snapshot users need to
pass this handler to containerd client.
This commit enables to use stargz-based remote snapshots through CRI by passing
the handler to containerd client on pull operation.
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Along with type(Sandbox or Container) and Sandbox name annotations
provide support for additional annotation:
- Container name
This will help us perform per container operation by comparing it
with pass through annotations (eg. pod metadata annotations from K8s)
Signed-off-by: Chethan Suresh <Chethan.Suresh@sony.com>
With go RWMutex design, no goroutine should expect to be able to
acquire a read lock until the read lock has been released, if one
goroutine call lock.
The original design is to reload cni network config on every single
Status CRI gRPC call. If one RunPodSandbox request holds read lock
to allocate IP for too long, all other RunPodSandbox/StopPodSandbox
requests will wait for the RunPodSandbox request to release read lock.
And the Status CRI call will fail and kubelet becomes NOTReady.
Reload cni network config at every single Status CRI call is not
necessary and also brings NOTReady situation. To lower the possibility
of NOTReady, CRI will reload cni network config if there is any valid fs
change events from the cni network config dir.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The resize chan is never closed when doing exec/attach now. What's more,
`resize` is a recieved only chan so it can not be closed. Use ctx to
exit the goroutine in `handleResizing` properly.
Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own
Closes#3471.
Signed-off-by: Kiril Vladimiroff <kiril@vladimiroff.org>
The pkg/store errors are duplicated errors of NotFound and AlreadyExist from
containerd's errdefs package and thus do not properly serialize when running
errdefs.ToGRPC on them. CRI runs this function on every return from a CRI
method so the conversion fails if there is a cache miss from the store caches
for containers or sandboxes. This change verifies that the errors are properly
converted to their gRPC values.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
In cgroup v1 container implementations, cgroupns is not used by default because
it was not available in the kernel until kernel 4.6 (May 2016), and the default
behavior will not change on cgroup v1 environments, because changing the
default will break compatibility and surprise users.
For cgroup v2, implementations are going to unshare cgroupns by default
so as to hide /sys/fs/cgroup from containers.
* Discussion: https://github.com/containers/libpod/issues/4363
* Podman PR (merged): https://github.com/containers/libpod/pull/4374
* Moby PR: https://github.com/moby/moby/pull/40174
This PR enables cgroupns for containers, but pod sandboxes are untouched
because probably there is no need to do.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The former default runtime "io.containerd.runc.v1" won't support new features
like support for cgroup v2: containerd/containerd#3726
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Reized the I/O buffers to align with the size of the kernel buffers with fifos
and move the close aspect of the console to key off of the stdin closing.
Fixes#3738
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
1. For Windows the Hostname property is not inherited from the sandbox and must
be passed for the Workload container activations as well.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Due to changes to the defaults in containerd, the CRI path to creating a
container OCI config needs to add back in the default UNIX $PATH (and
any other defaults) as that is the expected behavior from other
runtimes.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
The climan package has a command that can be registered with any urfav
cli app to generate man pages.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
- empty username means caller wants to use no credentials, typically for anonymous registry
- Fixes https://github.com/containerd/cri/issues/1249
Signed-off-by: Nishchay Kumar <mrawesomenix@gmail.com>
This adds a singleton `timeout` package that will allow services and user
to configure timeouts in the daemon. When a service wants to use a
timeout, it should declare a const and register it's default value
inside an `init()` function for that package. When the default config
is generated, we can use the `timeout` package to provide the available
timeout keys so that a user knows that they can configure.
These show up in the config as follows:
```toml
[timeouts]
"io.containerd.timeout.shim.cleanup" = 5
"io.containerd.timeout.shim.load" = 5
"io.containerd.timeout.shim.shutdown" = 3
"io.containerd.timeout.task.state" = 2
```
Timeouts in the config are specified in seconds.
Timeouts are very hard to get right and giving this power to the user to
configure things is a huge improvement. Machines can be faster and
slower and depending on the CPU or load of the machine, a timeout may
need to be adjusted.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
We are separating out the encryption code and have designed a few new
interfaces and APIs for processing content streams. This keep the core
clean of encryption code but enables not only encryption but support of
multiple content types ( custom media types ).
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This commit adds a flag to the runtime config that allows overloading of the default
privileged behaviour. When the flag is enabled on a runtime, host devices won't
be appended to the runtime spec if the container is run as privileged.
By default the flag is false to maintain the current behaviour of privileged.
Fixes#1213
Signed-off-by: Alex Price <aprice@atlassian.com>
By default the SecurityContext for Container activation can contain a Username
UID, GID. The order of precedences is username, UID, GID. If none of these
options are specified as a last resort attempt to set the ImageSpec username.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Moves to the containerd/log package over logrus directly. This benefits the
traces because if using any log context such as OpenCensus on the entry gRPC
API all traces for that gRPC method will now contain the appropriate TraceID,
SpanID for easy correlation.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Use a Pipe() rather than a file to pass the passphrase to the command
line tool. Pass the file descriptor to read the passphrase from as fd '3'.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Rather than passing the passphrase via command line write it into
a temp. file and pass the name of the file using passphrase-file option.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
A call to ExecSync should only return if the client context was canceled or
exceeded. The Timeout parameter to ExecSyncRequest is now used to send SIGKILL
if the exec'd process does not exit within Timeout but all paths wait for the
exec to exit.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
A call to StopContainer should only return if the client context is canceled or
its deadline was exceeded. The Timeout parameter on StopContainerRequest is now
used as the time AFTER sending the stop signal before the SIGKILL is delivered.
The call will remain until the container has exited or the client context has
finished.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
A call to RunPodSandbox should only return timeout if the operation has timed
out because the clients context deadline was exceeded. On client cancelation
it should return gRPC Canceled otherwise it should block until the sandbox has
exited.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Minor correctness. We should use the value of the const in the error message
instead of hard coding it in the string so if maxDNSSearches ever changes so
does the error.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
Only compose full container log path if neither of the paths is empty. Otherwise container won't start properly.
Signed-off-by: Cong Liu <conliu@google.com>
To support cross compilation for SIG* signals perfer the syscall package over
the unix package.
Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>
This bumps the containerd and sys packages in CRI
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove runtime-tools
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Update tests for oci opts package
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Containerd has its own ParseSignal and AtomicWriteFile implementation.
So there's no need to use these function from github.com/docker/docker.
Signed-off-by: Shengjing Zhu <i@zhsj.me>
Using the utility caused other project to have containerd/cri
as a dependency, only for this utility. The new `reference.ParseDockerRef`
function does the same (it's a copy of this function).
Tests were kept for now, but could be removed in future.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`copy` permits using to mix `[]byte` and `string` arguments without
explicit conversion. I removed explicit conversion to make the code simpler.
Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>
megacheck, gosimple and unused has been deprecated and subsumed by
staticcheck. And staticcheck also has been upgraded. we need to update
code for the linter issue.
close: #2945
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Because RunPodSandbox and CreateContainer will access metadata
without check, pod or container config file without metadata will
crash containerd.
This patch add checks to handle the issue.
Fixes: #1009
Signed-off-by: Hui Zhu <teawater@hyper.sh>
To match expectations of users coming from Docker engine runtime, add
the HOSTNAME to the environment of new containers in a pod.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>