In commit 2f426fdba6 we added
compatibility (and tests) to deal with pre-1.20 checkpoint files.
We are now well past the end of support for pre-1.20 kubelets,
so we can get rid of this code.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Previously we were returning the error string from 'err' (which is nil), when
we should have been returning it from result.Error. Without this it is hard to
debug issues with NodeUnprepareResources.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Update socket file detection logic to use os.Stat as per upstream
Go fix for https://github.com/golang/go/issues/33357. This resolves
the issue where socket files could not be properly identified on
Windows systems.
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.
The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.
However, DRA drivers need to be updated because the Go API changed. They can
return
status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.
The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.
This is a new, backwards-compatible field.
That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
See https://github.com/golang/mock#gomock: golang/mock is no longer
maintained, and should be replaced by go.uber.org/mock.
This allows golang/mock to be dropped from the status and vendored
fields in unwanted-dependencies.json.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
Today, DRA manager does not call plugin NodePrepareResource
for claims that it previously successfully handled, that is,
if claims are present in cache (checkpoint) even if node
rebooted.
After node reboots, it is required to call DRA plugin
for resource claims so that plugins may prepare them
again in case the resources dont persist reboot.
To achieve that, once kubelet is started, we call DRA
plugins for claims once if a pod sandbox is required
to be created during PodSync.
Signed-off-by: adrianc <adrianc@nvidia.com>
The cpu accumulator logic (that select CPUs for containers)
has some non-obvious code.
This commit adds some comments to make that code easier to
understand for new contributors. Some minor renames to
improve readability are also performed.
Add retry mechanism to handle cases where after kubelet restarts, the device
plugin unix socket(s) were created but not ready to serve yet.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add kubeletSocket file to fsnotify instead of polling and waiting for deletion
of device plugin unix socket as a way of detecting kubelet restart. We need to
ensure that the device plugin re-registers itself after kubelet restart depending
on the configured registration mode (auto-registration or controller registration).
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>