When a volume is already mounted with an unexpected SELinux label,
kubelet must unmount it first and then mount it back with the expected one.
Report an error to user, just in case the unmount takes too long.
In therory, this error should not happen too often, because two Pods with
different SELinux label will not enter Desired State of World, see
dsw.AddPodToVolume. It can happen when DSW and ASW SELinux labels only when
a volume has been deleted from DSW (= Pod was deleted) or a volume was
reconstructed after kubelet restart. In both cases, volume manager should
unmount the volume quickly.
The Desired State of World can require a different SELinux mount context than
is in the Actual State of World and it's perfectly OK. For example when
user changes SELinux context of Pods or when the context is reconstructed
after kubelet restart.
Don't spam log and don't report errors to the user as event - reconciler
will do the right thing and unmount the old volume (with wrong context) and
mount a new one in the next reconciliation. It's not an error, it's
expected workflow.
In order to improve the observability of the cpumanager,
add and populate metrics to track if the combination of
the kubelet configuration and podspec would trigger
exclusive core allocation and pinning.
We should avoid leaking any node/machine specific information
(e.g. core ids, even though this is admittedly an extreme example);
tracking these metrics seems to be a good first step, because
it allows us to get feedback without exposing details.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- config options not supported on Windows.
- files not closed, which means that they cannot be removed / renamed.
- paths not properly joined (filepath.Join should be used).
- time.Now() is not as precise on Windows, which means that 2
consecutive calls may return the same timestamp.
- different error messages on Windows.
- files have \r\n line endings on Windows.
- /tmp directory being used, which might not exist on Windows. Instead,
the OS-specific Temp directory should be used.
- the default value for Kubelet's EvictionHard field was containing
OS-specific fields. This is now moved, the field is now set during
Kubelet's initialization, after the config file is read.
Align the behavior of HTTP-based lifecycle handlers and HTTP-based
probers, converging on the probers implementation. This fixes multiple
deficiencies in the current implementation of lifecycle handlers
surrounding what functionality is available.
The functionality is gated by the features.ConsistentHTTPGetHandlers feature gate.
Some of the unit tests cannot pass on Windows due to various reasons:
- fsnotify does not have a Windows implementation.
- Proxy Mode IPVS not supported on Windows.
- Seccomp not supported on Windows.
- VolumeMode=Block is not supported on Windows.
- iSCSI volumes are mounted differently on Windows, and iscsiadm is a
Linux utility.
There is a corner case when blocking Pod termination via a lifecycle
preStop hook, for example by using this StateFulSet:
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: ubi
serviceName: "ubi"
replicas: 1
template:
metadata:
labels:
app: ubi
spec:
terminationGracePeriodSeconds: 1000
containers:
- name: ubi
image: ubuntu:22.04
command: ['sh', '-c', 'echo The app is running! && sleep 360000']
ports:
- containerPort: 80
name: web
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- 'echo aaa; trap : TERM INT; sleep infinity & wait'
```
After creation, downscaling, forced deletion and upscaling of the
replica like this:
```
> kubectl apply -f sts.yml
> kubectl scale sts web --replicas=0
> kubectl delete pod web-0 --grace-period=0 --force
> kubectl scale sts web --replicas=1
```
We will end up having two pods running by the container runtime, while
the API only reports one:
```
> kubectl get pods
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 92s
```
```
> sudo crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default)
d90088614c73b 12 minutes ago Ready web-0 default 0 (default)
```
When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong
container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`.
This is caused by the container lookup via its name (and no podUID) at:
02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)
And more specifiy by the conversion of the pod result map to a slice in `GetPods`:
02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)
We now solve that unexpected behavior by tracking the creation time of
the pod and sorting the result based on that. This will cause to always
match the most recently created pod.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Because of a bug in the commit 1e7bb20c52,
podresources metrics were added, they are updated in the right
places, but they are never exported, so they cannot be consumed.
Fix trivially registering the metrics.
Signed-off-by: Francesco Romani <fromani@redhat.com>