In order to maintain the correct invariants, the existing maxUnavailable
logic calculated the same data several times in different ways. Leverage
the simpler structure from maxSurge and calculate pod availability only
once, as well as perform only a single pass over all the pods in the
daemonset. This changed no behavior of the current controller, and
has a structure that is almost identical to maxSurge.
If MaxSurge is set, the controller will attempt to double up nodes
up to the allowed limit with a new pod, and then when the most recent
(by hash) pod is ready, trigger deletion on the old pod. If the old
pod goes unready before the new pod is ready, the old pod is immediately
deleted. If an old pod goes unready before a new pod is placed on that
node, a new pod is immediately added for that node even past the MaxSurge
limit.
The backoff clock is used consistently throughout the daemonset controller
as an injectable clock for the purposes of testing.
It is too easy to omit checking the return value for the
syncAndValidateDaemonSet test in large suites. Switch the method
type to be a test helper and fatal/error directly. Also rename
a method that referenced the old name 'Rollback' instead of
'RollingUpdate'.
The GC expects that once it sees a controller with a non-nil
DeletionTimestamp, that controller will not attempt any adoption.
There was a known race condition that could cause a controller to
re-adopt something orphaned by the GC, because the controller is using a
cached value of its own spec from before DeletionTimestamp was set.
This fixes that race by doing an uncached quorum read of the controller
spec just before the first adoption attempt. It's important that this
read occurs after listing potential orphans. Note that this uncached
read is skipped if no adoptions are attempted (i.e. at steady state).