Two problems:
1. Get is always using Existing nodes slice, and you will for sure miss any
updated data
2. Each Update duplicates node entry in UpdatedNodes slice
For the 1st, try to find a node in UpdatedNodes slice (same as for the List).
2nd - append only if there is no node with same name as updated, if there is
just replace object.
Change-Id: I9ef1cca2788ba946eee37fa1b037c124ad76074c
Automatic merge from submit-queue
Namespace Controller handles items with finalizers gracefully
This PR does the following:
1. ensures the "orphan" finalizer is not added to items during DELETE COLLECTION calls
2. does not treat presence of a finalizer as an unexpected error condition.
The 15s wait should only happen when finalizers not added by GC are used.
I am aware of any finalizer like that at this time.
Fixes https://github.com/kubernetes/kubernetes/issues/32519
Automatic merge from submit-queue
update error handling for daemoncontroller
Updates the DaemonSet controller to cleanly requeue with ratelimiting on errors, make use of the `utilruntime.HandleError` consistently, and wait for preconditions before doing work.
@ncdc @liggitt @sttts My plan is to use this one as an example of how to handle requeuing, preconditions, and processing error handling.
@foxish fyi
related to https://github.com/kubernetes/kubernetes/issues/30629
Automatic merge from submit-queue
Fix race condition in updating attached volume between master and node
This PR tries to fix issue #29324. The cause of this issue is that a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.
These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
This PR tries to fix issue #29324. This cause of this issue is a race
condition happens when marking volumes as attached for node status. This
PR tries to clean up the logic of when and where to mark volumes as
attached/detached. Basically the workflow as follows,
1. When volume is attached sucessfully, the volume and node info is
added into nodesToUpdateStatusFor to mark the volume as attached to the
node.
2. When detach request comes in, it will check whether it is safe to
detach now. If the check passes, remove the volume from volumesToReportAsAttached
to indicate the volume is no longer considered as attached now.
Afterwards, reconciler tries to update node status and trigger detach
operation. If any of these operation fails, the volume is added back to
the volumesToReportAsAttached list showing that it is still attached.
These steps should make sure that kubelet get the right (might be
outdated) information about which volume is attached or not. It also
garantees that if detach operation is pending, kubelet should not
trigger any mount operations.
Automatic merge from submit-queue
Use PV shared informer in PV controller
Use the PV shared informer, addressing (partially) https://github.com/kubernetes/kubernetes/issues/26247 . Using the PVC shared informer is not so simple because sometimes the controller wants to `Requeue` and...
Automatic merge from submit-queue
Change the eviction metric type and fix rate-limited-timed-queue
People how know better convinced me that aggregate counter is better than a gauge for a number of evictions metric. @Q-Lee
Per discussion with @pwittrock I add a v1.4 label and a cherrypick candidate label. This is a slightly bigger change than I thought, but it fixes a bug in eviction logic, so it's also important.
cc @derekwaynecarr @smarterclayton @timothysc
Automatic merge from submit-queue
add selfsubjectaccessreview API
Exposes the REST API for self subject access reviews. This allows a user to see whether or not they can perform a particular action.
@kubernetes/sig-auth
with StorageClass.Provisioner == <unknown plugin>, we should wait for
either external provisioner or volume admin to provide a PV for a claim
instead of reporting an error.
Fixes#31723
Automatic merge from submit-queue
Move StorageClass to a storage group
We discussed the pros and cons in sig-api-machinery yesterday. Choosing a particular group name means that clients (including our internal code) require less work and re-swizzling to handle promotions between versions. Even if you choose a group you end up not liking, the amount of work remains the same as the incubator work case: you move the affected kind, resource, and storage.
This moves the `StorageClass` type to the `storage.k8s.io` group (named for consistency with authentication, authorization, rbac, and imagepolicy). There are two commits, one for manaul changes and one for generated code.
Automatic merge from submit-queue
fix log message to include ds name
The pod name is never set because newPod is created a couple lines up without a name. Instead log the name and namespace of the ds which the pod is created from.
also bump the log level because reasons loop get's hit fairly often and does not indicate a bug.
Automatic merge from submit-queue
Sleep between NodeStatus update retries
Just a thing I found when looking into other problems.
This is pretty much no-risk change fixing wrong behavior. Do you think it should go in 1.4? @pwittrock
Node controller's internalPodInformer will block main thread
if it is not started as a go routine. This patch fixed this
by runing internalPodInformer as a go routine.
Automatic merge from submit-queue
Namespace controller deletes pods last
I think this fixes https://github.com/kubernetes/kubernetes/issues/29308 or at least helps further reduce the incidence.
This PR changes the order in which namespace controller prioritizes resources for deletion. It deletes all resources before deleting pods. The rationale for this change is to broadcast deletion of controllers that spawn pods first rather than trip those controllers up into thinking they should spawn more pods which would increase the risk of causing races with the `NamespaceLifecycle` admission plug-in. Many of those controllers also are not rate-limited in the face of rejection, so rather than promote a situation where they are rejected, we promote a situation that removes those things first.
Automatic merge from submit-queue
Post event message for volume attachment
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.
Automatic merge from submit-queue
Revert "daemonset controller should respect taints"
Reverts kubernetes/kubernetes#31020
We will be unreverting with some modifications after v1.4.
cc @pwittrock @davidopp
This PR is to add event message when attaching volume fails to help
users to debug. For detach failure, may address in a different PR since
it requires more data structure change.