crun 1.4.3 as well as runc 1.1 both support to open bind-mounts before
dropping privileges, as they are inaccessible after switching to the
user namespace. So that is the minimum version to use with containerd
1.7.
Also, since containerd 2.0 we use idmap mounts for files mounted in the
container created by containerd (like etc/hostname, etc/hosts, etc.), so
in that case we require newer OCI runtimes too. However, as the kubelet
doesn't request idmap mounts for kube volumes, we can lower the kernel
version.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Userns requires idmap mounts or to opt-in for a slow and expensive
chown. As idmap mounts support for overlayfs was merged in 5.19, let's
add the slow_chown config for our CI.
The config is harmless to keep it in new kernels, as if idmap mounts is
supported, it will be just used. Whenever all our CI is run with kernels
>= 5.19, we can remove this setting.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
If we don't use idmap mounts, doing a chown per pod is very expensive:
it implies duplicating the container storage for the image for every pod
and the latency to start a new pod is affected too.
Let's make sure users are aware of this, by having them opt-in, for
snapshotters that we have a better solution (like overlayfs, that has
support for idmap mounts).
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Deprecate the pacakge, but suppress linting errors for now. This is to allow
backporting these changes to release branches, which may still need to transition.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This "soft" deprecates the package, but keeps the local uses of the package,
which can make backporting this to release-branches easier (we can
still move all uses in those branches as well though).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When a shim process is unexpectedly killed in a way that was not initiated through containerd - containerd reports the pod as not ready but the containers as running. This results in kubelet repeatedly sending container kill requests that fail since containerd cannot connect to the shim.
Changes:
- In the container exit handler, treat `err: Unavailable` as if the container has already exited out
- When attempting to get a connection to the shim, if the controller isn't available assume that the shim has been killed (needs to be done since we have a separate exit handler that cleans up the reference to the shim controller - before kubelet has the chance to call StopPodSandbox)
Signed-off-by: Aditya Ramani <a_ramani@apple.com>
Remove `LimitNOFILE` from `containerd.service` to rely on the systemd v240 implicit default of `1024:524288`. On supported platforms with systemd prior to v240, packagers will patch the service with an explicit `LimitNOFILE=1024:524288`.
- `1024` soft limit is an implicit default, avoiding unexpected breakage. Software that needs a higher limit should request to raise the soft limit for its process.
- `524288` hard limit is an implicit default since systemd v240 and is adequate for most processes (_half of the historical limit from `fs.nr_open` of `1048576`_), while 4096 is the implicit default from the kernel (often too low).
- The hard limit may not exceed `fs.nr_open` (_which a value of `infinity` will resolve to_). On most systems with systemd v240 or newer, this will resolve to an excessive size of 2^30 (over 1 billion).
- When set to `infinity` (usually as the soft limit) software may experience significantly increased resource usage, resulting in a performance regression or runtime failures that are difficult to troubleshoot.
Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>