We are more liberal in what we accept as a volume id in k8s, and indeed
we ourselves generate names that look like `aws://<zone>/<id>` for
dynamic volumes.
This volume id (hereafter a KubernetesVolumeID) cannot directly be
compared to an AWS volume ID (hereafter an awsVolumeID).
We introduce types for each, to prevent accidental comparison or
confusion.
Issue #35746
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
When we are mounting a lot of volumes, we frequently hit rate limits.
Reduce the frequency with which we poll the status; introduces a bit of
latency but probably matches common attach times pretty closely, and
avoids causing rate limit problems everywhere.
Also, we now poll for longer, as when we timeout, the volume is in an
indeterminate state: it may be about to complete. The volume controller
can tolerate a slow attach/detach, but it is harder to tolerate the
indeterminism.
Finally, we ignore a sequence of errors in DescribeVolumes (up to 5 in a
row currently). So we will eventually return an error, but a one
off-failure (e.g. due to rate limits) does not cause us to spuriously
fail.
Automatic merge from submit-queue
Typos and englishify pkg/cloudprovider + pkg/dns + pkg/kubectl
**What this PR does / why we need it**: Just fixed some typos + "englishify" in pkg/cloudprovider + pkg/dns + pkg/kubectl
**Which issue this PR fixes** : None
**Special notes for your reviewer**: It's just fixes typos
**Release note**: `NONE`
The problem is that attachments are now done on the master, and we are
only caching the attachment map persistently for the local instance. So
there is now a race, because the attachment map is cleared every time.
Issue #29324
Automatic merge from submit-queue
AWS: More ELB attributes via service annotations
Replaces #25015 and addresses all of @justinsb's feedback therein. This is a new PR because I was unable to reopen#25015 to amend it.
I noticed recently that there is existing (but undocumented) precedent for the AWS cloud provider to manage ELB-specifc load balancer configuration based on service annotations. In particular, one can _already_ designate an ELB as "internal" or enable PROXY protocol.
This PR extends this capability to the management of ELB attributes, which includes the following items:
* Access logs:
* Enabled / disabled
* Emit interval
* S3 bucket name
* S3 bucket prefix
* Connection draining:
* Enabled / disabled
* Timeout
* Connection:
* Idle timeout
* Cross-zone load balancing:
* Enabled / disabled
Some of these are possibly more useful than others. Use cases that immediately come to mind:
* Enabling cross-zone load balancing is potentially useful for "Ubernetes Light," or anyone otherwise attempting to spread worker nodes around multiple AZs.
* Increasing idle timeout is useful for the benefit of anyone dealing with long-running requests. An example I personally care about would be git pushes to Deis' builder component.
Automatic merge from submit-queue
AWS: Support HTTP->HTTP mode for ELB
**What this PR does / why we need it**:
Right now it is not possible to create an AWS ELB that listens for HTTP and where the backend pod also listens for HTTP.
I asked @justinsb in slack and he said that this seems to be an oversight, so I'd like to use this PR as a step towards solving this.
**Special notes for your reviewer**:
I've only added a simple unit test. Are any integration tests needed? I'm not familiar with the code base.
cc @therc
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
We have a few functions that predate aws-sdk-go, but they have natural
equivalents in aws-sdk-go. Document them as deprecated, and replace
the implementation with the equivalent in aws-sdk-go to make it obvious
that they are the same.
Automatic merge from submit-queue
AWS: Added experimental option to skip zone check
This pull request resolves#28380. In the vast majority of cases, it is appropriate to validate the AWS region against a known set of regions. However, there is the edge case where this is undesirable as Kubernetes may be deployed in an AWS-like environment where the region is not one of the known regions.
By adding the optional **DisableStrictZoneCheck true** to the **[Global]** section in the aws.conf file (e.g. /etc/aws/aws.conf) one can bypass the ragion validation.
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.