Automatic merge from submit-queue (batch tested with PRs 49107, 47177, 49234, 49224, 49227)
Added logging to AWS api calls. #46969
Additionally logging of when AWS API calls start and end to help diagnose problems with kubelet on cloud provider nodes not reporting node status periodically. There's some inconsistency in logging around this PR we should discuss.
IMO, the API logging should be at a higher level than most other types of logging as you would probably only want it in limited instances. For most cases that is easy enough to do, but there are some calls which have some logging around them already, namely in the instance groups. My preference would be to keep the existing logging as it and just add the new API logs around the API call.
Automatic merge from submit-queue (batch tested with PRs 47915, 47856, 44086, 47575, 47475)
AWS: Fix suspicious loop comparing permissions
Because we only ever call it with a single UserId/GroupId, this would
not have been a problem in practice, but this fixes the code.
Fix#36902
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47922, 47195, 47241, 47095, 47401)
AWS: Set CredentialsChainVerboseErrors
This avoids a rather confusing error message.
Fix#39374
```release-note
NONE
```
Automatic merge from submit-queue
New annotation to add existing Security Groups to ELBs created by AWS cloudprovider
**What this PR does / why we need it**:
When K8S cluster is deployed in existing VPC there might be a need to attach extra SecurityGroups to ELB created by AWS cloudprovider. Example of it can be cases, where such Security Groups are maintained by another team.
**Special notes for your reviewer**:
For tests to pass depends on https://github.com/kubernetes/kubernetes/pull/45168 and therefore includes it
**Release note**:
```release-note
New 'service.beta.kubernetes.io/aws-load-balancer-extra-security-groups' Service annotation to specify extra Security Groups to be added to ELB created by AWS cloudprovider
```
We maintain a cache of all instances, and we invalidate the cache
whenever we see a new instance. For ELBs that should be sufficient,
because our usage is limited to instance ids and security groups, which
should not change.
Fix#45050
Automatic merge from submit-queue (batch tested with PRs 47510, 47516, 47482, 47521, 47537)
Batch AWS getInstancesByNodeNames calls with FilterNodeLimit
We are going to limit the getInstancesByNodeNames call with a batch
size of 150.
Fixes - #47271
```release-note
AWS: Batch DescribeInstance calls with nodeNames to 150 limit, to stay within AWS filter limits.
```
Automatic merge from submit-queue
AWS: Process disk attachments even with duplicate NodeNames
Fix#47404
```release-note
AWS: Process disk attachments even with duplicate NodeNames
```
Automatic merge from submit-queue (batch tested with PRs 46929, 47391, 47399, 47428, 47274)
AWS: Richer log message when metadata fails
Not a resolution, but should at least help determine the issue.
Issue #41904
```release-note
NONE
```
Service objects can be annotated with
`service.beta.kubernetes.io/aws-load-balancer-extra-security-groups`
to specify existing security groups to be added to ELB
created by AWS cloudprovider
Automatic merge from submit-queue (batch tested with PRs 36721, 46483, 45500, 46724, 46036)
AWS: Allow configuration of a single security group for ELBs
**What this PR does / why we need it**:
AWS has a hard limit on the number of Security Groups (500). Right now every time an ELB is created Kubernetes is creating a new Security Group. This allows for specifying a Security Group to use for all ELBS
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
For some reason the Diff tool makes this look like it was way more changes than it really was.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
move labels to components which own the APIs
During the apimachinery split in 1.6, we accidentally moved several label APIs into apimachinery. They don't belong there, since the individual APIs are not general machinery concerns, but instead are the concern of particular components: most commonly the kubelet. This pull moves the labels into their owning components and out of API machinery.
@kubernetes/sig-api-machinery-misc @kubernetes/api-reviewers @kubernetes/api-approvers
@derekwaynecarr since most of these are related to the kubelet
Automatic merge from submit-queue (batch tested with PRs 46686, 45049, 46323, 45708, 46487)
Log an EBS vol's instance when attaching fails because VolumeInUse
Messages now look something like this:
E0427 15:44:37.617134 16932 attacher.go:73] Error attaching volume "vol-00095ddceae1a96ed": Error attaching EBS volume "vol-00095ddceae1a96ed" to instance "i-245203b7": VolumeInUse: vol-00095ddceae1a96ed is already attached to an instance
status code: 400, request id: f510c439-64fe-43ea-b3ef-f496a5cd0577. The volume is currently attached to instance "i-072d9328131bcd9cd"
weird that AWS doesn't bother to put that information in there for us (it does when you try to delete a vol that's in use)
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46489, 46281, 46463, 46114, 43946)
AWS: consider instances of all states in DisksAreAttached, not just "running"
Require callers of `getInstancesByNodeNames(Cached)` to specify the states they want to filter instances by, if any. DisksAreAttached, cannot only get "running" instances because of the following attach/detach bug we discovered:
1. Node A stops (or reboots) and stays down for x amount of time
2. Kube reschedules all pods to different nodes; the ones using ebs volumes cannot run because their volumes are still attached to node A
3. Verify volumes are attached check happens while node A is down
4. Since aws ebs bulk verify filters by running nodes, it assumes the volumes attached to node A are detached and removes them all from ASW
5. Node A comes back; its volumes are still attached to it but the attach detach controller has removed them all from asw and so will never detach them even though they are no longer desired on this node and in fact desired elsewhere
6. Pods cannot run because their volumes are still attached to node A
So the idea here is to remove the wrong assumption that callers of `getInstancesByNodeNames(Cached)` only want "running" nodes.
I hope this isn't too confusing, open to alternative ways of fixing the bug + making the code nice.
ping @gnufied @kubernetes/sig-storage-bugs
```release-note
Fix AWS EBS volumes not getting detached from node if routine to verify volumes are attached runs while the node is down
```
Automatic merge from submit-queue
AWS: support node port health check
**What this PR does / why we need it**:
if a custom health check is set from the beta annotation on a service it
should be used for the ELB health check. This patch adds support for
that.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Let me know if any tests need to be added.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 45518, 46127, 46146, 45932, 45003)
aws: Support for ELB tagging by users
This PR provides support for tagging AWS ELBs using information in an
annotation and provided as a list of comma separated key-value pairs.
Closes https://github.com/kubernetes/community/pull/404
An admin wants to specify in which AWS availability zone(s) users may create persistent volumes using dynamic provisioning.
That's why the admin can now configure in StorageClass object a comma separated list of zones. Dynamically created PVs for PVCs that use the StorageClass are created in one of the configured zones.
This PR provides support for tagging AWS ELBs using information in an
annotation and provided as a list of comma separated key-value pairs.
Closes https://github.com/kubernetes/community/pull/404
Automatic merge from submit-queue (batch tested with PRs 43067, 45586, 45590, 38636, 45599)
AWS: Remove check that forces loadBalancerSourceRanges to be 0.0.0.0/0.
fixes#38633
Remove check that forces loadBalancerSourceRanges to be 0.0.0.0/0. Also, remove check that forces service.beta.kubernetes.io/aws-load-balancer-internal annotation to be 0.0.0.0/0. Ideally, it should be a boolean, but for backward compatibility, leaving it to be a non-empty value
I changed the function signature to contain protocol, port, and path.
When the service has a health check path and port set it will create an
HTTP health check that corresponds to the port and path. If those are
not set it will create a standard TCP health check on the first port
from the listeners that is not nil. As far as I know, there is no way to
tell if a Health Check should be HTTP vs HTTPS.