syncService shouldn't return error if the service doesn't exist which
means it's triggered by service deletion, otherwise the service would be
enqueued repeatedly even its cleanup has been executed successfully.
This patch makes syncService return nil if the error is NotFound when
getting the service, like the other controllers do.
This should fix a bug that could break masters when the EndpointSlice
feature gate was enabled. This was all tied to how the apiserver creates
and manages it's own services and endpoints (or in this case endpoint
slices). Consumers of endpoint slices also need to know about the
corresponding service. Previously we were trying to set an owner
reference here for this purpose, but that came with potential downsides
and increased complexity. This commit changes behavior of the apiserver
endpointslice integration to set the service name label instead of owner
references, and simplifies consumer logic to reference that (both are
set by the EndpointSlice controller).
Additionally, this should fix a bug with the EndpointSlice GenerateName
value that had previously been set with a "." as a suffix.
This tests the PDB status update path in DisruptionController and
asserts that conflicting writes (with eviciton handler) are handled
gracefully.
This adds the client-go fake.Clientset into our tests, because that is
the layer required for injecting update failures.
This also adds a TestMain so that DisruptionController logs can be
enabled during test. e.g.,
go test ./pkg/controller/disruption -v -args -v=4
This changes the retry logic in DisruptionController so that it
reconciles update conflicts. In the old behavior, any pdb status update
failure was retried with the same status, regardless of error.
Now there is no retry logic with the status update. The error is passed
up the stack where the PDB can be requeued for processing.
If the PDB status update error is a conflict error, there are some new
special cases:
- failSafe is not triggered, since this is considered a retryable error
- the PDB is requeued immediately (ignoring the rate limiter) because we
assume that conflict can be resolved by getting the latest version