Enabling this option effectively causes RDT class of a container to be a
soft requirement. If RDT support has not been enabled the RDT class
setting will not have any effect.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
Use goresctrl for parsing container and pod annotations related to RDT.
In practice, from the users' point of view, this patchs adds support for
a container annotation and two separate pod annotations for controlling
the RDT class of containers.
Container annotation can be used by a CRI client:
"io.kubernetes.cri.rdt-class"
Pod annotations for specifying the RDT class in the K8s pod spec level:
"rdt.resources.beta.kubernetes.io/pod"
(pod-wide default for all containers within)
"rdt.resources.beta.kubernetes.io/container.<container_name>"
(container-specific overrides)
Annotations are intended as an intermediate step before the CRI API
supports RDT.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
The ability to handle KVM based runtimes with SELinux has been added as
part of d715d00906.
However, that commit introduced some logic to check whether the
"container_kvm_t" label would or not be present in the system, and while
the intentions were good, there's two major issues with the approach:
1. Inspecting "/etc/selinux/targeted/contexts/customizable_types" is not
the way to go, as it doesn't list the "container_kvm_t" at all.
2. There's no need to check for the label, as if the label is invalid an
"Invalid Label" error will be returned and that's it.
With those two in mind, let's simplify the logic behind setting the
"container_kvm_t" label, removing all the unnecessary code.
Here's an output of VMM process running, considering:
* The state before this patch:
```
$ containerd --version
containerd github.com/containerd/containerd v1.6.0-beta.3-88-g7fa44fc98 7fa44fc98f
$ kubectl apply -f ~/simple-pod.yaml
pod/nginx created
$ ps -auxZ | grep cloud-hypervisor
system_u:system_r:container_runtime_t:s0 root 609717 4.0 0.5 2987512 83588 ? Sl 08:32 0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/be9d5cbabf440510d58d89fc8a8e77c27e96ddc99709ecaf5ab94c6b6b0d4c89/clh-api.sock
```
* The state after this patch:
```
$ containerd --version
containerd github.com/containerd/containerd v1.6.0-beta.3-89-ga5f2113c9 a5f2113c9fc15b19b2c364caaedb99c22de4eb32
$ kubectl apply -f ~/simple-pod.yaml
pod/nginx created
$ ps -auxZ | grep cloud-hypervisor
system_u:system_r:container_kvm_t:s0:c638,c999 root 614842 14.0 0.5 2987512 83228 ? Sl 08:40 0:00 /usr/bin/cloud-hypervisor --api-socket /run/vc/vm/f8ff838afdbe0a546f6995fe9b08e0956d0d0cdfe749705d7ce4618695baa68c/clh-api.sock
```
Note, the tests were performed using the following configuration snippet:
```
[plugins]
[plugins.cri]
enable_selinux = true
[plugins.cri.containerd]
[plugins.cri.containerd.runtimes]
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
```
And using the following pod yaml:
```
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
runtimeClassName: kata
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
```
Fixes: #6371
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
CRI API has been updated to include a an optional `resources` field in the
LinuxPodSandboxConfig field, as part of the RunPodSandbox request.
Having sandbox level resource details at sandbox creation time will have
large benefits for sandboxed runtimes. In the case of Kata Containers,
for example, this'll allow for better support of SW/HW architectures
which don't allow for CPU/memory hotplug, and it'll allow for better
queue sizing for virtio devices associated with the sandbox (in the VM
case).
If this sandbox resource information is provided as part of the run
sandbox request, let's introduce a pattern where we will update the
pause container's runtiem spec to include this information in the
annotations field.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
When containerd use this config:
```
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
endpoint = ["http://localhost:5000"]
```
Due to the `newTransport` function does not initialize the `TLSClientConfig` field.
Then use `TLSClientConfig` to cause nil pointer dereference
Signed-off-by: wanglei <wllenyj@linux.alibaba.com>
These are simple metrics that allow users to view more fine grained metrics on
internal operations.
Signed-off-by: Michael Crosby <michael@thepasture.io>
See https://kep.k8s.io/2371
* Implement new CRI RPCs - `ListPodSandboxStats` and `PodSandboxStats`
* `ListPodSandboxStats` and `PodSandboxStats` which return stats about
pod sandbox. To obtain pod sandbox stats, underlying metrics are
read from the pod sandbox cgroup parent.
* Process info is obtained by calling into the underlying task
* Network stats are taken by looking up network metrics based on the
pod sandbox network namespace path
* Return more detailed stats for cpu and memory for existing container
stats. These metrics use the underlying task's metrics to obtain
stats.
Signed-off-by: David Porter <porterdavid@google.com>
This change ignore errors during container runtime due to large
image labels and instead outputs warning. This is necessary as certain
image building tools like buildpacks may have large labels in the images
which need not be passed to the container.
Signed-off-by: Sambhav Kothari <sambhavs.email@gmail.com>
This will allow running Windows Containers to have their resource
limits updated through containerd. The CPU resource limits support
has been added for Windows Server 20H2 and newer, on older versions
hcsshim will raise an Unimplemented error.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.
The containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.
kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html
Signed-off-by: Michael Crosby <michael@thepasture.io>
Currently, there are few issues that preventing containers
with image volumes to properly start on Windows.
- Unlike the Linux implementation, the Container volume mount paths
were not created if they didn't exist. Those paths are now created.
- while copying the image volume contents to the container volume,
the layers were not properly deactivated, which means that the
container can't start since those layers are still open. The layers
are now properly deactivated, allowing the container to start.
- even if the above issue didn't exist, the Windows implementation of
mount/Mount.Mount deactivates the layers, which wouldn't allow us
to copy files from them. The layers are now deactivated after we've
copied the necessary files from them.
- the target argument of the Windows implementation of mount/Mount.Mount
was unused, which means that folder was always empty. We're now
symlinking the Layer Mount Path into the target folder.
- hcsshim needs its Container Mount Paths to be properly formated, to be
prefixed by C:. This was an issue for Volumes defined with Linux-like
paths (e.g.: /test_dir). filepath.Abs solves this issue.
Signed-off-by: Claudiu Belu <cbelu@cloudbasesolutions.com>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>