Files
kubernetes/pkg/kubelet/checkpointmanager
Patrick Ohly b9d00841a6 kubelet: improve checkpoint errors
Recording the expected and actual checksum in the error makes it possible to
provide that information, for example in a failed test like the ones for DRA.
Otherwise developers have to manually step through the test with a debugger to
figure out what the new checksum is.
2024-07-17 16:07:31 +02:00
..
2024-07-17 16:07:31 +02:00
2021-02-28 15:17:29 -08:00

DISCLAIMER

  • Sig-Node community has reached a general consensus, as a best practice, to avoid introducing any new checkpointing support. We reached this understanding after struggling with some hard-to-debug issues in the production environments caused by the checkpointing.
  • Any changes to the checkpointed data structure would be considered incompatible and a component should add its own handling if it needs to ensure backward compatibility of reading old-format checkpoint files.

Introduction

This folder contains a framework & primitives, Checkpointing Manager, which is used by several other Kubelet submodules, dockershim, devicemanager, pods and cpumanager, to implement checkpointing at each submodule level. As already explained in above Disclaimer section, think twice before introducing any further checkpointing in Kubelet. If still checkpointing is required, then this folder provides the common APIs and the framework for implementing checkpointing. Using same APIs across all the submodules will help maintaining consistency at Kubelet level.

Below is the history of checkpointing support in Kubelet.

Package First checkpointing support merged on PR link
kubelet/dockershim Feb 3, 2017 [CRI] Implement Dockershim Checkpoint
devicemanager Sep 6, 2017 Deviceplugin checkpoint
kubelet/pod Nov 22, 2017 Initial basic bootstrap-checkpoint support
cpumanager Oct 27, 2017 Add file backed state to cpu manager