Enabling this option effectively causes RDT class of a container to be a
soft requirement. If RDT support has not been enabled the RDT class
setting will not have any effect.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Use goresctrl for parsing container and pod annotations related to RDT.
In practice, from the users' point of view, this patchs adds support for
a container annotation and two separate pod annotations for controlling
the RDT class of containers.
Container annotation can be used by a CRI client:
"io.kubernetes.cri.rdt-class"
Pod annotations for specifying the RDT class in the K8s pod spec level:
"rdt.resources.beta.kubernetes.io/pod"
(pod-wide default for all containers within)
"rdt.resources.beta.kubernetes.io/container.<container_name>"
(container-specific overrides)
Annotations are intended as an intermediate step before the CRI API
supports RDT.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
The ability to handle KVM based runtimes with SELinux has been added as
part of d715d00906.
However, that commit introduced some logic to check whether the
"container_kvm_t" label would or not be present in the system, and while
the intentions were good, there's two major issues with the approach:
1. Inspecting "/etc/selinux/targeted/contexts/customizable_types" is not
the way to go, as it doesn't list the "container_kvm_t" at all.
2. There's no need to check for the label, as if the label is invalid an
"Invalid Label" error will be returned and that's it.
With those two in mind, let's simplify the logic behind setting the
"container_kvm_t" label, removing all the unnecessary code.
Here's an output of VMM process running, considering:
* The state before this patch:
```
$ containerd --version
containerd github.com/containerd/containerd v1.6.0-beta.3-88-g7fa44fc98 7fa44fc98f
$ kubectl apply -f ~/simple-pod.yaml
pod/nginx created
$ ps -auxZ | grep cloud-hypervisor
system_u:system_r:container_runtime_t:s0 root 609717 4.0 0.5 2987512 83588 ? Sl 08:32 0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/be9d5cbabf440510d58d89fc8a8e77c27e96ddc99709ecaf5ab94c6b6b0d4c89/clh-api.sock
```
* The state after this patch:
```
$ containerd --version
containerd github.com/containerd/containerd v1.6.0-beta.3-89-ga5f2113c9 a5f2113c9fc15b19b2c364caaedb99c22de4eb32
$ kubectl apply -f ~/simple-pod.yaml
pod/nginx created
$ ps -auxZ | grep cloud-hypervisor
system_u:system_r:container_kvm_t:s0:c638,c999 root 614842 14.0 0.5 2987512 83228 ? Sl 08:40 0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/f8ff838afdbe0a546f6995fe9b08e0956d0d0cdfe749705d7ce4618695baa68c/clh-api.sock
```
Note, the tests were performed using the following configuration snippet:
```
[plugins]
[plugins.cri]
enable_selinux = true
[plugins.cri.containerd]
[plugins.cri.containerd.runtimes]
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
```
And using the following pod yaml:
```
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
runtimeClassName: kata
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
```
Fixes: #6371
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
CRI API has been updated to include a an optional `resources` field in the
LinuxPodSandboxConfig field, as part of the RunPodSandbox request.
Having sandbox level resource details at sandbox creation time will have
large benefits for sandboxed runtimes. In the case of Kata Containers,
for example, this'll allow for better support of SW/HW architectures
which don't allow for CPU/memory hotplug, and it'll allow for better
queue sizing for virtio devices associated with the sandbox (in the VM
case).
If this sandbox resource information is provided as part of the run
sandbox request, let's introduce a pattern where we will update the
pause container's runtiem spec to include this information in the
annotations field.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
When containerd use this config:
```
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
endpoint = ["http://localhost:5000"]
```
Due to the `newTransport` function does not initialize the `TLSClientConfig` field.
Then use `TLSClientConfig` to cause nil pointer dereference
Signed-off-by: wanglei <wllenyj@linux.alibaba.com>
These are simple metrics that allow users to view more fine grained metrics on
internal operations.
Signed-off-by: Michael Crosby <michael@thepasture.io>
See https://kep.k8s.io/2371
* Implement new CRI RPCs - `ListPodSandboxStats` and `PodSandboxStats`
* `ListPodSandboxStats` and `PodSandboxStats` which return stats about
pod sandbox. To obtain pod sandbox stats, underlying metrics are
read from the pod sandbox cgroup parent.
* Process info is obtained by calling into the underlying task
* Network stats are taken by looking up network metrics based on the
pod sandbox network namespace path
* Return more detailed stats for cpu and memory for existing container
stats. These metrics use the underlying task's metrics to obtain
stats.
Signed-off-by: David Porter <porterdavid@google.com>
This change ignore errors during container runtime due to large
image labels and instead outputs warning. This is necessary as certain
image building tools like buildpacks may have large labels in the images
which need not be passed to the container.
Signed-off-by: Sambhav Kothari <sambhavs.email@gmail.com>
This will allow running Windows Containers to have their resource
limits updated through containerd. The CPU resource limits support
has been added for Windows Server 20H2 and newer, on older versions
hcsshim will raise an Unimplemented error.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This fixes the TODO of this function and also expands on how the primary pod ip
is selected. This change allows the operator to prefer ipv4, ipv6, or retain the
ordering provided by the return results of the CNI plugins.
This makes it much more flexible for ops to configure containerd and how IPs are
set on the pod.
Signed-off-by: Michael Crosby <michael@thepasture.io>
The CRI-plugin subscribes the image event on k8s.io namespace. By
default, the image event is created by CRI-API. However, the image can
be downloaded by containerd API on k8s.io with the customized labels.
The CRI-plugin should use patch update for `io.cri-containerd.image`
label in this case.
Fixes: #5900
Signed-off-by: Wei Fu <fuweid89@gmail.com>
The Pod Sandbox can enter in a NotReady state if the task associated
with it no longer exists (it died, or it was killed). In this state,
the Pod network namespace could still be open, which means we can't
remove the sandbox, even if --force was used.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
CRI container runtimes mount devices (set via kubernetes device plugins)
to containers by taking the host user/group IDs (uid/gid) to the
corresponding container device.
This triggers a problem when trying to run those containers with
non-zero (root uid/gid = 0) uid/gid set via runAsUser/runAsGroup:
the container process has no permission to use the device even when
its gid is permissive to non-root users because the container user
does not belong to that group.
It is possible to workaround the problem by manually adding the device
gid(s) to supplementalGroups. However, this is also problematic because
the device gid(s) may have different values depending on the workers'
distro/version in the cluster.
This patch suggests to take RunAsUser/RunAsGroup set via SecurityContext
as the device UID/GID, respectively. The feature must be enabled by
setting device_ownership_from_security_context runtime config value to
true (valid on Linux only).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
There was recent changes to cri to bring in a Windows section containing a
security context object to the pod config. Before this there was no way to specify
a user for the pod sandbox container to run as. In addition, the security context
is a field for field mirror of the Windows container version of it, so add the
ability to specify a GMSA credential spec for the pod sandbox container as well.
Signed-off-by: Daniel Canter <dcanter@microsoft.com>