Commit Graph

397 Commits

Author SHA1 Message Date
Mike Danese
1b84fb7d74 make testclient threadsafe by guarding internal state with accessors 2015-07-29 16:17:17 -07:00
Daniel Smith
8d5a6b063c Merge pull request #11029 from justinsb/fix_aws_security_group_races
AWS: Fix security group races
2015-07-27 16:15:43 -07:00
Alex Robinson
60611c253e Add a resync period for services in the service controller.
This should ensure all load balancers get deleted even if a reordering of
watch events causes us to strand one after its service has been deleted,
because the sync will notice that the service controller's cache has a
service in it that no longer exists in the apiserver.

It could still leak in the case that the controller manager is killed
between when it leaks something and the sync runs, but this should
improve things.
2015-07-27 18:03:13 +00:00
Justin Santa Barbara
23a190cd97 Fixes per review
Primarily go style issues; also a TODO that really exponential backoff
is the correct policy for API call retries.
2015-07-26 18:30:02 -04:00
Justin Santa Barbara
092d407a48 AWS: Fix race in security-group read/create
We need to find the ID for a named security group, or create a new one.

We do this by listing the security groups, and then doing a create if we
cannot find one.  This is a race though; against another thread if the
AWS API were consistent, but generally because the AWS API is actually
eventually consistent.

We wrap it in a retry loop.
2015-07-26 18:16:05 -04:00
Justin Santa Barbara
d7bace23ff AWS: Fix race-condition in tagging of security group
When we create a security-group in the AWS API, there is sometimes
a delay before we can tag it (the AWS API is eventually consistent).

So we wrap CreateTags in a simple retry loop.
2015-07-26 18:16:05 -04:00
Vish Kannan
2a5a6b99cb Merge pull request #10635 from smarterclayton/cloud_provider_should_err
Cloud provider should return an error
2015-07-23 17:50:45 -07:00
Daniel Smith
15d50f4211 Fix part of #9382 2015-07-23 15:48:45 -07:00
Alex Robinson
b0351ff266 Detect if UpdateTCPLoadBalancer left its GCE target pool in an incorrect state. 2015-07-17 19:01:21 +00:00
Alex Robinson
e943c47e68 Fix issue of comparing instance URLs with different project ID representations
in GCE target pools.
2015-07-15 21:24:45 +00:00
Brendan Burns
a8f02e5472 Automatically open a firewall when creating a GCE load balancer. 2015-07-10 14:35:29 -07:00
Alex Robinson
b52c6f673e Increase the rate limiting of GCE's token source. The burst being at 3
means transient errors won't incur such long waits, but repeating
failures shouldn't be retrying every second.
2015-07-09 22:51:23 +00:00
CJ Cullen
53c9f324c2 Add prometheus metrics for altTokenSource. 2015-07-07 15:25:23 -07:00
Yu-Ju Hong
530bff315f Merge pull request #10719 from justinsb/aws_mountpoints
AWS: Some images require volume mounts on /dev/xvdX
2015-07-07 10:48:19 -07:00
Yu-Ju Hong
736b3cb050 Merge pull request #10181 from swagiaal/aws-ebs-name
Use instance availability zone for AWS EBS
2015-07-06 11:39:33 -07:00
Justin Santa Barbara
f33df03d50 AWS: Some images require volume mounts on /dev/xvdX
It seems impossible to determine from outside.  Thankfully we're running
the attachment from inside the instance, so can check for /dev/sdX or
/dev/xvdX.

More modern images seem to be moving to /dev/xvdX
2015-07-04 10:45:06 -04:00
Justin Santa Barbara
591a113406 AWS: Return InstanceNotFound from ExternalID when not found
Despite finding and documenting the importance of this, I was still doing it
wrong!
2015-07-04 10:41:38 -04:00
Justin Santa Barbara
5ae7c13ad3 AWS: Use private dns name for node name again
This is a partial reversion of #9728, and should fix #10612.

9728 used the AWS instance id as the node name.  But proxy, logs
and exec all used the node name as the host name for contacting the minion.

It is possible to resolve a host to the IP, and this fixes logs.  But
exec and proxy also require an SSL certificate match on the hostname,
and this is harder to fix.

So the sensible fix seems to be a minimal reversion of the changes in #9728,
and we can revisit this post 1.0.
2015-07-03 01:23:51 -04:00
Clayton Coleman
d8bb4552de Cloud provider should return an error
Not fatal - makes cloud provider useful in methods that
can return error.
2015-07-01 14:41:49 -04:00
Sami Wagiaalla
4a6a492281 Use instance availability zone for AWS EBS
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
2015-06-25 16:00:30 -04:00
Justin Santa Barbara
4461daa218 AWS: Enabling resize tests 2015-06-24 19:01:42 -04:00
Justin Santa Barbara
2a5ed2f086 AWS: Use auto-scaling group to run minions
This uses the dynamic CIDR work, and we set source-dest-check to false
when we configure the route (which kind-of makes sense)
2015-06-19 10:22:15 -04:00
Satnam Singh
9f32599bee Merge pull request #9720 from justinsb/aws_routes
Refactor Routes, and dynamically configure minion CIDRs on AWS
2015-06-18 17:16:29 -07:00
Justin Santa Barbara
0ad16a187d Refactor findRouteTable to be less verbose
Thanks for the suggestion @cjcullen
2015-06-18 17:08:32 -07:00
Justin Santa Barbara
a4e15cdf3e AWS: Configure minion routes dynamically
We need to implement the Routes interface, and then enable the functionality in the cluster scripts.
2015-06-18 14:59:37 -07:00
Justin Santa Barbara
a3b43a36fd Refactor cloud route interface, to avoid assumption that routes are named 2015-06-18 14:59:37 -07:00
Justin Santa Barbara
a77bc9cfc4 Document assumption made by node-controller, and fix AWS to match
ExternalID must return "", cloudprovider.InstanceNotFound if the instance
is not found, for nodecontroller to remove nodes corresponding to deleted instances.
2015-06-18 14:55:10 -07:00
Satnam Singh
4c13f8957d Merge pull request #10057 from justinsb/aws_id_as_name_2
Fix of reverted #9728
2015-06-18 14:07:21 -07:00
Justin Santa Barbara
bd512ae06d AWS: Use the instance id as the node name
The EC2 instance id is the canonical node name on EC2.
2015-06-18 12:40:10 -07:00
Justin Santa Barbara
df87470ecf Allow cloud providers to return a node identifier different from the hostname 2015-06-18 12:40:05 -07:00
Satnam Singh
e4f5529a2d Revert "Allow nodename to be != hostname, use AWS instance ID on AWS" 2015-06-18 11:27:55 -07:00
CJ Cullen
abf1e768dc Pass through an explicit PROXY_SSH_USER.
Use user@user instead of user@hostname in case hostname is too long.
2015-06-18 10:35:02 -07:00
Satnam Singh
790ca2344f Merge pull request #9728 from justinsb/aws_id_as_name
Allow nodename to be != hostname, use AWS instance ID on AWS
2015-06-18 10:17:39 -07:00
CJ Cullen
15596ede41 Make AddSSHKeys a controller loop. Make sure master's always initializes m.tunnels. 2015-06-17 17:46:27 -07:00
Justin Santa Barbara
c89b0cd807 AWS: Use the instance id as the node name
The EC2 instance id is the canonical node name on EC2.
2015-06-17 00:40:43 -04:00
Justin Santa Barbara
efaead81dc Allow cloud providers to return a node identifier different from the hostname 2015-06-17 00:40:43 -04:00
Justin Santa Barbara
bf7946c326 AWS: Define new m4 instance types 2015-06-17 00:04:05 -04:00
Justin Santa Barbara
1561fce81c servicecontroller: last state applied to LB vs last state seen
We need the last state seen for interpreting the change-stream,
separately we need to track the last state we successfully applied to the
load balancer.
2015-06-16 18:59:03 -04:00
CJ Cullen
4d5d0457ef Fix mislooping in ssh.go. Add retries to AddSSHKeys. 2015-06-16 00:08:37 -07:00
Brendan Burns
99bf48dc2f Merge pull request #9542 from brendandburns/validate
Change the way we test if a disk is already attached.
2015-06-09 22:00:06 -07:00
Brendan Burns
3350eecedf Change the way we test if a disk is already attached.
Validated by manual introspection on a running GCE cluster.
2015-06-09 17:50:52 -07:00
krousey
f62a2a1bb6 Merge pull request #9451 from cjcullen/mig
Use Node IP Address instead of Node.Name in minion.ResourceLocation.
2015-06-09 15:52:12 -07:00
krousey
3d803ab7b2 Merge pull request #9410 from cjcullen/ratelimit
Add a RateLimiter for the gce altTokenSource.
2015-06-09 11:11:48 -07:00
CJ Cullen
2d85e4a094 Use Node IP Address instead of Node.Name in minion.ResourceLocation.
Refactor GetNodeHostIP into pkg/util/node (instead of pkg/util to break import cycle).

Include internalIP in gce NodeAddresses.
2015-06-08 16:58:00 -07:00
krousey
afb9a7e362 Merge pull request #9373 from justinsb/aws_lb_cleanup
Make deletion of an AWS load balancer clean
2015-06-08 16:49:21 -07:00
CJ Cullen
be0d24824d Add a RateLimiter for the gce altTokenSource. 2015-06-08 11:16:52 -07:00
Justin Santa Barbara
c2caa3f1da AWS: Fix cleanup of security group
The most reliable way seems to be to deauthorize the LB security group from
other groups, then delete the LB itself, then repeatedly retry to delete the LB
security group.

We can't delete the LB security group until the LB is actually completely
deleted, but the LB is hidden from the API during deletion.  So our only real
option is to retry deletion of the LB security group until the expected error
goes away when the LB is fully deleted.
2015-06-06 23:20:34 -04:00
Justin Santa Barbara
1700259508 AWS: Ignore the UserId when determining whether we can skip revoking a security group
Otherwise we weren't correctly de-authorizing the AWS LB SG from the Node SG
2015-06-06 12:37:01 -04:00
Justin Santa Barbara
8fafefd728 Fix doc for edge-case return from removeSecurityGroupIngress 2015-06-06 12:25:50 -04:00
Justin Santa Barbara
e32c66c6f4 Fix typo: Ingess -> Ingress 2015-06-06 12:22:50 -04:00