Automatic merge from submit-queue (batch tested with PRs 43925, 42512)
AWS: add KubernetesClusterID as additional option when VPC is set
This is a small enhancement after the PRs https://github.com/kubernetes/kubernetes/pull/41695 and https://github.com/kubernetes/kubernetes/pull/39996
## Release Notes
```release-note
AWS cloud provider: allow to set KubernetesClusterID or KubernetesClusterTag in combination with VPC.
```
The cloudprovider is being refactored out of kubernetes core. This is being
done by moving all the cloud-specific calls from kube-apiserver, kubelet and
kube-controller-manager into a separately maintained binary(by vendors) called
cloud-controller-manager. The Kubelet relies on the cloudprovider to detect information
about the node that it is running on. Some of the cloudproviders worked by
querying local information to obtain this information. In the new world of things,
local information cannot be relied on, since cloud-controller-manager will not
run on every node. Only one active instance of it will be run in the cluster.
Today, all calls to the cloudprovider are based on the nodename. Nodenames are
unqiue within the kubernetes cluster, but generally not unique within the cloud.
This model of addressing nodes by nodename will not work in the future because
local services cannot be queried to uniquely identify a node in the cloud. Therefore,
I propose that we perform all cloudprovider calls based on ProviderID. This ID is
a unique identifier for identifying a node on an external database (such as
the instanceID in aws cloud).
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
Implement bulk polling of volumes
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564
But it changes the implementation to use an interface
and doesn't affect other implementations.
cc @justinsb
This implements Bulk volume polling using ideas presented by
justin in https://github.com/kubernetes/kubernetes/pull/39564
But it changes the implementation to use an interface
and doesn't affect other implementations.
Set the vpcID when dummy is created (+1 squashed commit)
Squashed commits:
[0b1ac6e83e] Use the VPC flag and KubernetesClusterTag as identifier (+1 squashed commit)
Squashed commits:
[962bc56e38] Remove again availabilityZone and fix naming (+1 squashed commit)
Squashed commits:
[e3d1b41807] Use the VCID flag as identifier (+1 squashed commit)
Squashed commits:
[5b99fe6243] Add flag for external master
Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)
AWS: Support shared tag `kubernetes.io/cluster/<clusterid>`
We recognize an additional cluster tag:
kubernetes.io/cluster/<clusterid>
This now allows us to share resources, in particular subnets.
In addition, the value is used to track ownership/lifecycle. When we
create objects, we record the value as "owned".
We also refactor out tags into its own file & class, as we are touching
most of these functions anyway.
```release-note
AWS: Support shared tag `kubernetes.io/cluster/<clusterid>`
```
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
AWS: Skip instances that are taggged as a master
We recognize a few AWS tags, and skip over masters when finding zones
for dynamic volumes. This will fix#34583.
This is not perfect, in that really the scheduler is the only component
that can correctly choose the zone, but should address the common
problem.
```release-note
AWS: Do not consider master instance zones for dynamic volume creation
```
We recognize an additional cluster tag:
kubernetes.io/cluster/<clusterid>
This now allows us to share resources, in particular subnets.
In addition, the value is used to track ownership/lifecycle. When we
create objects, we record the value as "owned".
We also refactor out tags into its own file & class, as we are touching
most of these functions anyway.
Automatic merge from submit-queue (batch tested with PRs 41756, 36344, 34259, 40843, 41526)
add InternalDNS/ExternalDNS node address types
This PR adds internal/external DNS names to the types of NodeAddresses that can be reported by the kubelet.
will spawn follow up issues for cloud provider owners to include these when possible
```release-note
Nodes can now report two additional address types in their status: InternalDNS and ExternalDNS. The apiserver can use `--kubelet-preferred-address-types` to give priority to the type of address it uses to reach nodes.
```
We recognize a few AWS tags, and skip over masters when finding zones
for dynamic volumes. This will fix#34583.
This is not perfect, in that really the scheduler is the only component
that can correctly choose the zone, but should address the common
problem.
Automatic merge from submit-queue
AWS: trust region if found from AWS metadata
```release-note
AWS: trust region if found from AWS metadata
```
Means we can run in newly announced regions without a code change.
We don't register the ECR provider in new regions, so we will still need
a code change for now.
Fix#35014
Means we can run in newly announced regions without a code change.
We don't register the ECR provider in new regions, so we will still need
a code change for now.
This also means we do trust config / instance metadata, and don't reject
incorrectly configured zones.
Fix#35014
Automatic merge from submit-queue
AWS: Add exponential backoff to waitForAttachmentStatus() and createTags()
We should use exponential backoff while waiting for a volume to get attached/detached to/from a node. This will lower AWS load and reduce API call throttling.
This partly fixes#33088
@justinsb, can you please take a look?
On AWS, we should not reuse device names as long as possible, see
https://aws.amazon.com/premiumsupport/knowledge-center/ebs-stuck-attaching/
"If you specify a device name that is not in use by EC2, but is being used by
the block device driver within the EC2 instance, the attachment of the EBS
volume does not succeed and the EBS volume is stuck in the attaching state."
This patch adds a device name allocator that tries to find a name that's next
to the last used device name instead of using the first available one.
This way we will loop through all device names ("xvdba" .. "xvdzz") before
a device name is reused.
We should use exponential backoff while waiting for a volume to get attached/
detached to/from a node. This will lower AWS load and reduce its API call
throttling.
This method has been unused by k8s for some time, and yet is the last
piece of the cloud provider API that encourages provider names to be
human-friendly strings (this method applies a regex to instance names).
Actually removing this deprecated method is part of a long effort to
migrate from instance names to instance IDs in at least the OpenStack
provider plugin.
We are more liberal in what we accept as a volume id in k8s, and indeed
we ourselves generate names that look like `aws://<zone>/<id>` for
dynamic volumes.
This volume id (hereafter a KubernetesVolumeID) cannot directly be
compared to an AWS volume ID (hereafter an awsVolumeID).
We introduce types for each, to prevent accidental comparison or
confusion.
Issue #35746
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
When we are mounting a lot of volumes, we frequently hit rate limits.
Reduce the frequency with which we poll the status; introduces a bit of
latency but probably matches common attach times pretty closely, and
avoids causing rate limit problems everywhere.
Also, we now poll for longer, as when we timeout, the volume is in an
indeterminate state: it may be about to complete. The volume controller
can tolerate a slow attach/detach, but it is harder to tolerate the
indeterminism.
Finally, we ignore a sequence of errors in DescribeVolumes (up to 5 in a
row currently). So we will eventually return an error, but a one
off-failure (e.g. due to rate limits) does not cause us to spuriously
fail.
Automatic merge from submit-queue
Typos and englishify pkg/cloudprovider + pkg/dns + pkg/kubectl
**What this PR does / why we need it**: Just fixed some typos + "englishify" in pkg/cloudprovider + pkg/dns + pkg/kubectl
**Which issue this PR fixes** : None
**Special notes for your reviewer**: It's just fixes typos
**Release note**: `NONE`
The problem is that attachments are now done on the master, and we are
only caching the attachment map persistently for the local instance. So
there is now a race, because the attachment map is cleared every time.
Issue #29324
Automatic merge from submit-queue
AWS: More ELB attributes via service annotations
Replaces #25015 and addresses all of @justinsb's feedback therein. This is a new PR because I was unable to reopen#25015 to amend it.
I noticed recently that there is existing (but undocumented) precedent for the AWS cloud provider to manage ELB-specifc load balancer configuration based on service annotations. In particular, one can _already_ designate an ELB as "internal" or enable PROXY protocol.
This PR extends this capability to the management of ELB attributes, which includes the following items:
* Access logs:
* Enabled / disabled
* Emit interval
* S3 bucket name
* S3 bucket prefix
* Connection draining:
* Enabled / disabled
* Timeout
* Connection:
* Idle timeout
* Cross-zone load balancing:
* Enabled / disabled
Some of these are possibly more useful than others. Use cases that immediately come to mind:
* Enabling cross-zone load balancing is potentially useful for "Ubernetes Light," or anyone otherwise attempting to spread worker nodes around multiple AZs.
* Increasing idle timeout is useful for the benefit of anyone dealing with long-running requests. An example I personally care about would be git pushes to Deis' builder component.
Automatic merge from submit-queue
AWS: Support HTTP->HTTP mode for ELB
**What this PR does / why we need it**:
Right now it is not possible to create an AWS ELB that listens for HTTP and where the backend pod also listens for HTTP.
I asked @justinsb in slack and he said that this seems to be an oversight, so I'd like to use this PR as a step towards solving this.
**Special notes for your reviewer**:
I've only added a simple unit test. Are any integration tests needed? I'm not familiar with the code base.
cc @therc
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
We have a few functions that predate aws-sdk-go, but they have natural
equivalents in aws-sdk-go. Document them as deprecated, and replace
the implementation with the equivalent in aws-sdk-go to make it obvious
that they are the same.
Automatic merge from submit-queue
AWS: Added experimental option to skip zone check
This pull request resolves#28380. In the vast majority of cases, it is appropriate to validate the AWS region against a known set of regions. However, there is the edge case where this is undesirable as Kubernetes may be deployed in an AWS-like environment where the region is not one of the known regions.
By adding the optional **DisableStrictZoneCheck true** to the **[Global]** section in the aws.conf file (e.g. /etc/aws/aws.conf) one can bypass the ragion validation.
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Lots of comments describing the heuristics, how it fits together and the
limitations.
In particular, we can't guarantee correct volume placement if the set of
zones is changing between allocating volumes.
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
Fixes#26268
Implements the second SSL ELB annotation, per #24978
service.beta.kubernetes.io/aws-load-balancer-ssl-ports=* (or e.g. https)
If not specified, all ports are secure (SSL or HTTPS).
Add ELB proxy protocol support via the annotation
"service.beta.kubernetes.io/aws-load-balancer-proxy-protocol". This
allows servers like Nginx and Haproxy to retrieve the real IP address of
a remote client.
Automatic merge from submit-queue
AWS: Move enforcement of attached AWS device limit from kubelet to scheduler
Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It can be adjusted by `KUBE_MAX_PD_VOLS` env. variable there. Therefore we don't need the same check in kubelet. If the system admin wants to attach more, we should allow it.
Kubelet limit is now 650 attached volumes ('ba'..'zz').
Note that the scheduler counts only *pods* assigned to a node. When a pod is deleted and a new pod is scheduled on a node, kubelet start (slowly) detaching the old volume and (slowly) attaching the new volume. Depending on AWS speed **it may happen that more than KUBE_MAX_PD_VOLS volumes are actually attached to a node for some time!** Kubelet will clean it up in few seconds / minutes (both attach/detach is quite slow).
Fixes#22994
Automatic merge from submit-queue
AWS: Allow cross-region image pulling with ECR
Fixes#23298
Definitely should be in the release notes; should maybe get merged in 1.2 along with #23594 after some soaking. Documentation changes to follow.
cc @justinsb @erictune @rata @miguelfrde
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.
Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging
Behold, running in us-east-1:
```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image "123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```
*"One small step for a pod, one giant leap for Kube-kind."*
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24369)
<!-- Reviewable:end -->
Use constructor for ecrProvider
Rename package to "credentials" like golint requests
Don't wrap the lazy provider with a caching provider
Add immedita compile-time interface conformance checks for the interfaces
Added comments
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.
Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging
Behold, running in us-east-1:
```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image 123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```
*"One small step for a pod, one giant leap for Kube-kind."*
Automatic merge from submit-queue
AWS: Add support for ap-northeast-2 region (Seoul)
This PR does:
- Support AWS Seoul region: ap-northeast-2.
Currently, I can not setup Kubernetes on AWS Seoul.
Error Messages:
>
> ip-10-0-0-50 core # docker logs 0697db
> I0419 07:57:44.569174 1 aws.go:466] Zone not specified in configuration file; querying AWS metadata service
> F0419 07:57:44.570380 1 controllermanager.go:279] Cloud provider could not be initialized: could not init cloud provider "aws": not a valid AWS zone (unknown region): ap-northeast-2a