The ip permission method now checks for containment, not equality, so
order of parameters matter. This change fixes
`removeSecurityGroupIngress` to pass in the removal permission first to
compare against the existing permission.
Change isEqualIPPermission to consider the entire list of security group
ids on when checking if a security group id has already been added.
This is used for example when adding and removing ingress rules to the
cluster nodes from an elastic load balancer. Without this, once there
are multiple load balancers, the method as it stands incorrectly returns
false even if the security group id is in the list of group ids. This
causes a few problems: dangling security groups which fill up an
account's limit since they don't get removed, and inability to recreate
load balancers in certain situations (receiving an
InvalidPermission.Duplicate from AWS when adding the same security
group).
findInstancesByNodeNames was a simple loop around
findInstanceByNodeName, which made an EC2 API call for each call.
We've had trouble with this sort of behaviour hitting EC2 rate limits on
bigger clusters (e.g. #11979).
Instead, change this method to fetch _all_ the tagged EC2 instances, and
then loop through the local results. This is one API call (modulo
paging).
We are currently only using findInstancesByNodeNames for the load
balancer, where we attach every node, so we were fetching all but one of
the instances anyway.
Issue #11979
For AWS EBS, a volume can only be attached to a node in the same AZ.
The scheduler must therefore detect if a volume is being attached to a
pod, and ensure that the pod is scheduled on a node in the same AZ as
the volume.
So that the scheduler need not query the cloud provider every time, and
to support decoupled operation (e.g. bare metal) we tag the volume with
our placement labels. This is done automatically by means of an
admission controller on AWS when a PersistentVolume is created backed by
an EBS volume.
Support for tagging GCE PVs will follow.
Pods that specify a volume directly (i.e. without using a
PersistentVolumeClaim) will not currently be scheduled correctly (i.e.
they will be scheduled without zone-awareness).
General purpose SSD ('gp2') volume type is just slighly more expensive than
Magnetic ('standard' / default in AWS), while the performance gain is pretty
significant.
So far, the volumes were created only during testing, where the extra cost
won't make any difference. In future, we plan to introduce QoS classes, where
users could choose SSD/Magnetic depending on their use cases.
'gp2' is just the default volume type for (hopefuly) short period before these
QoS classes are implemented.
From some reason, MiBs were used for public functions and AWS cloud provider
recalculated them to GiB. Let's expose what AWS really supports and don't hide
real allocation units.
Only takes the first available subnet in a AZ, ignore other subnets
and log warning about this.
Removes AWS region comparison for subnet AZs. A VPC is only in a single
AWS region.
Fixes#12381
The ELB client lookup isn't necessary because the service
does not operate across regions. Instead the client should
be built like the others by querying for the region from
the master node's metadata service.
This will allows authentication with the AWS API using the
~/.aws/credentials file which is created by runnign 'aws configure' on
a node.
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
ELB will automatically create a health check, but if we update the
listeners the old health check port sticks around, and all the instances
are marked offline.
Update the health-checks to match the listeners: we just check the first
valid service port, with some hard-coded options for timeouts / retries etc.
This turned out to be a little convoluted, but is needed because deleting an ELB on AWS
is a painful UX - it won't have the same endpoint when it is recreated.
Also started splitting the provider into files, but only for new functions (so far!)
Previously the servicecontroller would do the delete, but by having the cloudprovider
take that task on, we can later remove it from the servicecontroller, and the
cloudprovider can do something more efficient.