kubernetes

History

Francesco Romani 2f426fdba6 devicemanager: checkpoint: support pre-1.20 data The commit `a8b8995ef2` changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-26 09:54:11 +02:00
..
checkpoint.go	devicemanager: checkpoint: support pre-1.20 data	2021-10-26 09:54:11 +02:00
checkpointv1.go	devicemanager: checkpoint: support pre-1.20 data	2021-10-26 09:54:11 +02:00

Francesco Romani 2f426fdba6 devicemanager: checkpoint: support pre-1.20 data

The commit a8b8995ef2
changed the content of the data kubelet writes in the checkpoint.
Unfortunately, the checkpoint restore code was not updated,
so if we upgrade kubelet from pre-1.20 to 1.20+, the
device manager cannot anymore restore its state correctly.

The only trace of this misbehaviour is this line in the
kubelet logs:
```
W0615 07:31:49.744770    4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA
```

If we hit this bug, the device allocation info is
indeed NOT up-to-date up until the device plugins register
themselves again. This can take up to few minutes, depending
on the specific device plugin.

While the device manager state is inconsistent:
1. the kubelet will NOT update the device availability to zero, so
   the scheduler will send pods towards the inconsistent kubelet.
2. at pod admission time, the device manager allocation will not
   trigger, so pods will be admitted without devices actually
   being allocated to them.

To fix these issues, we add support to the device manager to
read pre-1.20 checkpoint data. We retroactively call this
format "v1".

Signed-off-by: Francesco Romani <fromani@redhat.com>

2021-10-26 09:54:11 +02:00

checkpoint.go devicemanager: checkpoint: support pre-1.20 data 2021-10-26 09:54:11 +02:00

checkpointv1.go devicemanager: checkpoint: support pre-1.20 data 2021-10-26 09:54:11 +02:00