We need to find the ID for a named security group, or create a new one.
We do this by listing the security groups, and then doing a create if we
cannot find one. This is a race though; against another thread if the
AWS API were consistent, but generally because the AWS API is actually
eventually consistent.
We wrap it in a retry loop.
When we create a security-group in the AWS API, there is sometimes
a delay before we can tag it (the AWS API is eventually consistent).
So we wrap CreateTags in a simple retry loop.
It seems impossible to determine from outside. Thankfully we're running
the attachment from inside the instance, so can check for /dev/sdX or
/dev/xvdX.
More modern images seem to be moving to /dev/xvdX
This is a partial reversion of #9728, and should fix#10612.
9728 used the AWS instance id as the node name. But proxy, logs
and exec all used the node name as the host name for contacting the minion.
It is possible to resolve a host to the IP, and this fixes logs. But
exec and proxy also require an SSL certificate match on the hostname,
and this is harder to fix.
So the sensible fix seems to be a minimal reversion of the changes in #9728,
and we can revisit this post 1.0.
ExternalID must return "", cloudprovider.InstanceNotFound if the instance
is not found, for nodecontroller to remove nodes corresponding to deleted instances.
The most reliable way seems to be to deauthorize the LB security group from
other groups, then delete the LB itself, then repeatedly retry to delete the LB
security group.
We can't delete the LB security group until the LB is actually completely
deleted, but the LB is hidden from the API during deletion. So our only real
option is to retry deletion of the LB security group until the expected error
goes away when the LB is fully deleted.
Whenever we do a list we now filter on tags so we only see resources relating
to our cluster.
Also, rationalize all the DescribeX calls:
* They all take a request object (so that we can pass filters)
* They do paging if that is required (and return the underlying resources)
* They wrap any error with a "error while listing X: %v" message
These were introduced because the new official AWS SDK uses *string
where the old library used strings. We now use the helpers much
more (orEmpty and isNilOrEmpty).
Fixes#9123