ELB will automatically create a health check, but if we update the
listeners the old health check port sticks around, and all the instances
are marked offline.
Update the health-checks to match the listeners: we just check the first
valid service port, with some hard-coded options for timeouts / retries etc.
This turned out to be a little convoluted, but is needed because deleting an ELB on AWS
is a painful UX - it won't have the same endpoint when it is recreated.
Also started splitting the provider into files, but only for new functions (so far!)
Previously the servicecontroller would do the delete, but by having the cloudprovider
take that task on, we can later remove it from the servicecontroller, and the
cloudprovider can do something more efficient.
The current code assumes the full domain name will not be included,
which is not always the case. This patch adds support for computing the
host tag from a fully qualified domain name.
This should ensure all load balancers get deleted even if a reordering of
watch events causes us to strand one after its service has been deleted,
because the sync will notice that the service controller's cache has a
service in it that no longer exists in the apiserver.
It could still leak in the case that the controller manager is killed
between when it leaks something and the sync runs, but this should
improve things.
We need to find the ID for a named security group, or create a new one.
We do this by listing the security groups, and then doing a create if we
cannot find one. This is a race though; against another thread if the
AWS API were consistent, but generally because the AWS API is actually
eventually consistent.
We wrap it in a retry loop.
When we create a security-group in the AWS API, there is sometimes
a delay before we can tag it (the AWS API is eventually consistent).
So we wrap CreateTags in a simple retry loop.
It seems impossible to determine from outside. Thankfully we're running
the attachment from inside the instance, so can check for /dev/sdX or
/dev/xvdX.
More modern images seem to be moving to /dev/xvdX
This is a partial reversion of #9728, and should fix#10612.
9728 used the AWS instance id as the node name. But proxy, logs
and exec all used the node name as the host name for contacting the minion.
It is possible to resolve a host to the IP, and this fixes logs. But
exec and proxy also require an SSL certificate match on the hostname,
and this is harder to fix.
So the sensible fix seems to be a minimal reversion of the changes in #9728,
and we can revisit this post 1.0.