The field in fact says that the container runtime should relabel a volume
when running a container with it, it does not say that the volume supports
SELinux. For example, NFS can support SELinux, but we don't want NFS
volumes relabeled, because they can be shared among several Pods.
this commit updates checkEphemeralStorage to be able to add container log stats, if applicable.
It also updates the old check when container log stats aren't found to be more accurate.
Specifically, this check previously worked because of a fluke programming accident:
according to this block in pkg/kubelet/stats/helper.go:113
```
if result.Rootfs != nil {
rootfsUsage := *cfs.BaseUsageBytes
result.Rootfs.UsedBytes = &rootfsUsage
}
```
BaseUsageBytes should be the value added, not TotalUsageBytes. However, since in this case
one also needs to account for the calculated log size, which is TotalUsageBytes - BaseUsageBytes
using TotalUsageBytes value accidentally worked.
Updating the case to use the correct value AND log offset fixes this accident and makes
the behavior more in line with what happens when calculating ephemeral storage.
Signed-off-by: Peter Hunt <pehunt@redhat.com>
in https://github.com/kubernetes/kubernetes/pull/74441,
the namespace and name were added to the pod log location.
However, cAdvisor stats provider wasn't correspondingly updated.
since CRI-O uses cAdvisor stats provider by default, despite being a CRI implementation,
eviction with ephemeral storage and container logs doesn't work as expected, until now!
Signed-off-by: Peter Hunt <pehunt@redhat.com>
The package says:
> the libcontainer SELinux package is only built for Linux, so it is
> necessary to have a NOP wrapper which is built for non-Linux platforms
This is not true, Kubernetes now imports
github.com/opencontainers/selinux/go-selinux and it has proper
multiplatform support (i.e. NOOP on non-Linux platforms).
Removing the whole package and calling go-selinux directly.
Before this fix, hint permutations such as:
permutation: [{11 true} {0101 true}]
Could result in merged hints of:
mergedHint: {01 true}
This was possible because both hints in the permutation container a "preferred"
allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are
*required* to satisfy the allocation). With this in place, the simplified logic
we had simply kept the merged hint as preferred as well.
However, what we really want is to ensure that the merged hint is only
preferred if *true* alignment of all resources is possible (i.e. if all hints
in the permutation are preferred AND their affinities are exactly equal).
The only exception to this is if *no* topology information is provided by a
given hint provider. In this case, we assume alignment doesn't matter and only
consider the resources that actually have hints provided for them.
This changes the semantics of permutations of the form:
permutation: [{111 true} {011 true}]
To now result in the merged hint of:
mergedHint: {011 false}
Instead of:
mergedHint: {011 true}
This is arguably how it should always have been though (because a hint should
not be preferred if true alignment isn't possible), and two tests have had to
change to accomodate these new semantics.
This commit changes the merge function to implement the updated logic, adds a
test to verify it is functioning correctly, and updates the two tests mentioned
above to adjust to the new semantics.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
The remote runtime implementation now supports the `verbose` fields,
which are required for consumers like cri-tools to enable multi CRI
version support.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
TestStart was previously flaky. In approx 100_000 local runs, it failed
about 70% of the time, and has been mentioned as a flaky unit test in
the past.
This flake was due to a race condition with the logic as written and the
go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go
Routine, which is not guarunteed to be called immediately.
This meant that if `m.Start()` was called before `notifier.Start()`, the
test would fail, as the notifier would not have been started before the
4 events were processed and lock released.
Here, we update the test to more closely match the intended application
behaviour, and have events passed to the channel when `Start` is called
on the notifier.
This ensures that -Start gets called and additionally validates
that the correct channel is provided to the notifier.
Stop was never called previously, as it only gets called on a subsequent
call to UpdateThreshold. `AnyTimes()` hid that this did not occur.
cAdvisor has code to expose OOM metrics since 0.40.0, but this was not
included in Kubelet so far. This commit enables it.
Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>
- Allow a podWorker to start if it is blocked by a pod that has been
terminated before starting
- When a pod can't start AND has already been terminated, exit cleanly
- Add a unit test that exercises race conditions in pod workers
If CRI returns a container that has been created but is not running,
it is not safe to assume it is terminal, as our connection to CRI
may have failed. Instead, created is treated as waiting, as in
"waiting for this container to start". Either syncPod or
syncTerminatingPod is responsible for handling this state.