If kubelet passes the swap limit (default memory limit = swap limit ),
it is configured for container irrespective if the node supports swap.
Signed-off-by: Qasim Sarfraz <qasimsarfraz@microsoft.com>
These are not actually being pulled, just removing the deprecated k8s.gcr.io
from the code-base. While at it, also renamed / removed vars that shadowed
with package-level definitions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When the pods are transitioning there are several
cases where containers might not be in valid state.
There were several cases where the stats where
failing hard but we should just continue on as
they are transient and will be picked up again
when kubelet queries for the stats again.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
This commit just updates the sbserver with the same fix we did on main:
9bf5aeca77 ("cri: Fix net.ipv4.ping_group_range with userns ")
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This is a port of 31a6449734 ("Add capability for snapshotters to
declare support for UID remapping") to sbserver.
This patch remaps the rootfs in the platform-specific if user namespaces
are in use, so the pod can read/write to the rootfs.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This patch requests the OCI runtime to create a userns when the CRI
message includes such request.
This is an adaptation of a7adeb6976 ("cri: Support pods with user
namespaces") to sbserver, although the container_create.go parts were
already ported as part of 40be96efa9 ("Have separate spec builder for
each platform"),
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This commit just ports 36f520dc04 ("Let OCI runtime create netns when
userns is used") to sbserver.
The CNI network setup is done after OCI start, as it didn't seem simple
to get the sandbox PID we need for the netns otherwise.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Currently there is a big c&p of the helpers between these two folders
and a TODO in the platform agnostic file to organize them in the future,
when some other things settle.
So, let's just copy them for now.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Commit c085fac1e5 ("Move sandbox start behind controller") moved the
runtimeStart to only account for time _after_ the netns has been
created.
To match what we currently do in cri/server, let's move it to just after
the get the sandbox runtime.
This come up when porting userns to sbserver, as the CNI network setup
needs to be done at a later stage and runtimeStart was accounting for
the CNI network setup time only when userns is enabled.
To avoid that discrepancy, let's just move it earlier, that also matches
what we do in cri/server.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Beside the "in future the when" typo, we take the chance to reflect that
user namespaces are already merged.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
These two errors can occur in the following scenarios:
ECONNRESET: the target process reset connection between CRI and itself.
see: #111825 for detail
EPIPE: the target process did not read the received data, causing the
buffer in the kernel to be full, resulting in the occurrence of Zero Window,
then closing the connection (FIN, RESET)
see: #74551 for detail
In both cases, we should RESET the httpStream.
Signed-off-by: wangxiang <scottwangsxll@gmail.com>
userns.RunningInUserNS() checks if the code calling that function is
running inside a user namespace. But we need to check if the container
we will create will use a user namespace, in that case we need to
disable the sysctl too (or we would need to take the userns mapping into
account to set the IDs).
This was added in PR:
https://github.com/containerd/containerd/pull/6170/
And the param documentation says it is not enabled when user namespaces
are in use:
https://github.com/containerd/containerd/pull/6170/files#diff-91d0a4c61f6d3523b5a19717d1b40b5fffd7e392d8fe22aed7c905fe195b8902R118
I'm not sure if the intention was to disable this if containerd is
running inside a userns (rootless, if that is even supported) or just
when the pod has user namespaces.
Out of an abundance of caution, I'm keeping the userns.RunningInUserNS()
so it is still not used if containerd runs inside a user namespace.
With this patch and "enable_unprivileged_icmp = true" in the config,
running containerd as root on the host, pods with user namespaces start
just fine. Without this patch they fail with:
... failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: w
/proc/sys/net/ipv4/ping_group_range: invalid argument: unknown
Thanks a lot to Andy on the k8s slack for reporting the issue. He also
mentions he hits this with k3s on a default installation (the param
is off by default on containerd, but k3s turns that on by default it
seems). He also debugged which part of the stack was setting that
sysctl, found the PR that added this code in containerd and a workaround
(to turn the bool off).
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
As a follow up change to adding a SandboxMetrics rpc to the core
sandbox service, the controller needed a corresponding rpc for CRI
and others to eventually implement.
This leaves the CRI (non-shim mode) controller unimplemented just to
have a change with the API addition to start.
Signed-off-by: Danny Canter <danny@dcantah.dev>
When a container is just created, exited state the container will not have stats. A common case for this in k8s is the init containers for a pod. The will be present in the listed containers but will not have a running task and there for no stats.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
The 10-containerd-net.conflist file generated from the conf_template
should be written atomically so that partial writes are not visible to
CNI plugins. Use the new consistentfile package to ensure this on
Unix-like platforms such as Linux, FreeBSD, and Darwin.
Fixes https://github.com/containerd/containerd/issues/8607
Signed-off-by: Samuel Karp <samuelkarp@google.com>
The initial PR had a check for nil metrics but after some refactoring in the PR the test case that was suppose cover HPC was missing a scenario where the metric was not nil but didn't contain any metrics. This fixes that case and adds a testcase to cover it.
Signed-off-by: James Sturtevant <jstur@microsoft.com>
This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.
Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>