kubernetes/pkg/kubelet/cm
Swati Sehgal 7ac399c205 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
..
admission smtalign: cm: factor out admission response 2021-07-08 23:15:37 +02:00
containermap cpu manager policy set to none, no one remove container id from container map, lead memory leak 2022-03-30 23:25:05 +08:00
cpumanager Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537 2023-03-02 09:04:56 -08:00
cpuset cpuset: Convert Fatalf to Errrof in tests 2023-02-21 05:41:16 +00:00
devicemanager node: device-mgr: Handle recovery by checking if healthy devices exist 2023-03-06 11:52:23 +00:00
dra DRA: pass CDI devices through CRI CDIDevice field 2023-02-28 19:21:20 +02:00
memorymanager This commit contains the following: 2023-02-24 18:21:21 +00:00
topologymanager node: topology-mgr: code optimization 2023-02-15 14:04:10 +00:00
util generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
cgroup_manager_linux_test.go This commit contains the following: 2023-02-24 18:21:21 +00:00
cgroup_manager_linux.go This commit contains the following: 2023-02-24 18:21:21 +00:00
cgroup_manager_test.go generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
cgroup_manager_unsupported.go This commit contains the following: 2023-02-24 18:21:21 +00:00
container_manager_linux_test.go Merge pull request #111371 from sivchari/improve-naming 2022-12-14 02:23:37 -08:00
container_manager_linux.go rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
container_manager_stub.go This commit contains the following: 2023-02-24 18:21:21 +00:00
container_manager_unsupported.go kubelet: add support for dynamic resource allocation 2022-11-11 21:58:03 +01:00
container_manager_windows.go kubelet: prepare DRA resources before CNI setup 2023-02-06 20:40:11 +02:00
container_manager.go rename ExperimentalPodPidsLimit to PodPidsLimit 2023-03-04 01:48:16 +00:00
fake_container_manager.go kubelet: prepare DRA resources before CNI setup 2023-02-06 20:40:11 +02:00
fake_internal_container_lifecycle.go Make CRI v1 the default and allow a fallback to v1alpha2 2021-11-17 11:05:05 -08:00
fake_pod_container_manager.go This commit contains the following: 2023-02-24 18:21:21 +00:00
helpers_linux_test.go Merge pull request #111371 from sivchari/improve-naming 2022-12-14 02:23:37 -08:00
helpers_linux.go In-place Pod Vertical Scaling - core implementation 2023-02-24 18:21:21 +00:00
helpers_unsupported.go In-place Pod Vertical Scaling - core implementation 2023-02-24 18:21:21 +00:00
helpers.go Enable allocatable support for Windows nodes 2018-10-30 11:17:23 +08:00
internal_container_lifecycle_linux.go Make CRI v1 the default and allow a fallback to v1alpha2 2021-11-17 11:05:05 -08:00
internal_container_lifecycle_unsupported.go Make CRI v1 the default and allow a fallback to v1alpha2 2021-11-17 11:05:05 -08:00
internal_container_lifecycle_windows.go Make CRI v1 the default and allow a fallback to v1alpha2 2021-11-17 11:05:05 -08:00
internal_container_lifecycle.go Make CRI v1 the default and allow a fallback to v1alpha2 2021-11-17 11:05:05 -08:00
node_container_manager_linux_test.go generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
node_container_manager_linux.go fix: rename 2022-08-26 00:44:31 +09:00
OWNERS Cleanup OWNERS files (No Activity in the last year) 2021-12-15 10:34:02 -05:00
pod_container_manager_linux_test.go chore: use require instead of assert 2022-08-08 21:06:38 +08:00
pod_container_manager_linux.go This commit contains the following: 2023-02-24 18:21:21 +00:00
pod_container_manager_stub.go In-place Pod Vertical Scaling - core implementation 2023-02-24 18:21:21 +00:00
pod_container_manager_unsupported.go generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
qos_container_manager_linux_test.go generated: Run hack/update-gofmt.sh 2021-08-24 15:47:49 -04:00
qos_container_manager_linux.go logging: fix names of keys 2023-01-23 14:24:29 +01:00
types.go This commit contains the following: 2023-02-24 18:21:21 +00:00