In order to be able to use new mounter library, this PR adds the
mounterPath flag to kubelet which passes the flag to the mount
interface. If flag is empty, mount uses default mount path.
Automatic merge from submit-queue
Loadbalanced client src ip preservation enters beta
Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
'names' is an array of FQDNs. 'instances' is a map indexed by canonicalized
name. Clearly these two won't always match, so when building the final
instance array to return, make sure to look up map entries by their canonicalized
name.
In the below example, "ocp-master-5pob" is clearly found as a GCE instance
but when building the final instance array it cannot be matched as the code
is looking for "ocp-master-5pob.c.ose-refarch.internal" instead. The node
is then deleted from the cluster as it cannot be found by the cloud provider.
gce.go:2519] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): initial node prefix ocp-
gce.go:2530] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): looking for instances map[ocp-master-5pob:<nil>]
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-c' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-b' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2576] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): found instance 'ocp-master-5pob' remaining 0
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-d' (remaining 0)
gce.go:2588] Failed to retrieve instance: "ocp-master-5pob.c.ose-refarch.internal"
gce.go:2624] ### getInstanceByName(ocp-master-5pob.c.ose-refarch.internal): got []: instance not found
gce.go:2626] getInstanceByName/multiple-zones: failed to get instance ocp-master-5pob.c.ose-refarch.internal; err: instance not found
nodecontroller.go:587] Deleting node (no longer present in cloud provider): ocp-master-5pob.c.ose-refarch.internal
nodecontroller.go:664] Recording Deleting Node ocp-master-5pob.c.ose-refarch.internal because it's not present according to cloud provider event message for node ocp-master-5pob.c.ose-refarch.internal
Automatic merge from submit-queue
azure: lower log priority for skipped nic update message
**What this PR does / why we need it**: Very minor, just wanted to remove some log noise I introduced in #34526.
I chose `V(3)` since it aligns with the other nicupdate message printed out here, and will be hidden for the usual default of `--v=2`.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
This allows security groups to be created and attached to the neutron
port that the loadbalancer is using on the subnet.
The security group ID that is assigned to the nodes needs to be
provided, to allow for traffic from the loadbalancer to the nodePort
to be refelected in the rules.
This adds two config items to the LoadBalancer options -
ManageSecurityGroups (bool)
NodeSecurityGroupID (string)
- test pd-ssd and pd-standard on GCE,
- test all four volume types on AWS
- test just the default volume type on OpenStack (right now, there is no API
to get list of them)
Automatic merge from submit-queue
openstack: Support config-drive and improve CurrentNodeName, GetZone
This PR adds support for fetching local instance metadata via config-drive (as well as querying metadata service), and surfaces some additional metadata information (from either source):
- `CurrentNodeName` now returns the OpenStack instance name, rather than the current hostname (they might not be the same)
- `GetZone` includes availability zone label in `FailureDomain`
Thanks to @kiall for a WIP implementation of the latter.
Automatic merge from submit-queue
OpenStack LBaaSV2: EnsureLoadBalancer now updates instead of recreates existing LBs
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: Current LBaaSV2 integration recreates existing LBs and causes service downtime and floating ip rotation. New implementation updates LBs without service downtime or any ip rotation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#32794
**Special notes for your reviewer**: I really need this before we can move to production with kubernetes. Getting this to v1.4 would be really great. I have performed plenty of testing; lb and listener creation, port changing and listener update, multiple listeners for multi-port LBs, and deletion. Seems to work flawlessly.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Previously the OpenStack provider just returned the hostname in
CurrentNodeName. With this change, we return the local OpenStack
instance name, as the API intended.
Config-drive is an alternate no-network method for publishing local
instance metadata on OpenStack. This change implements support for
fetching data from config-drive, and tries it before querying the
network metadata service (since config-drive will fail quickly if not
available).
Note config-drive involves mounting the filesystem with label
"config-2", so anyone using config-drive and running kubelet in a
container will need to ensure /dev/disk/by-label/config-2 is available
inside the container (read-only).
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
Because of how our test environment was setup, we didn’t notice that we were assuming the load balancer hosts list to be IP addresses, while they actually are hostnames.
Also updated some comments and added a check to prevent trying to release a public IP if we don’t have one.
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
vSphere cloud provider: ExternalID/InstanceID not returning appropriate error for non-existing VM
Addresses #33215.
When vCenter returns error vm not found, this is now being translated to
the appropriate error 'cloudprovider.InstanceNotFound' which indicates
to Kubernetes node controller that the VM is in fact not found.
Automatic merge from submit-queue
Fixed a bug that causes k8s to delete all healthmonitors on your OpenStack tenant
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
The OpenStack LBaaS v2 api does not support filtering health monitors by pool_id, so /lbaas/healthmonitors?pool_id=abc123 will always return all health monitors in your OpenStack tenant.
This presents a problem when, in the very next block of code, we loop over the list of monitorIDs and delete them one-by-one. This will delete all the health monitors in your tenant without warning.
Fortunately, we already got the healthmonitor IDs when we built the list of pools. Using those, we can delete only those healthmonitors associated with our pool(s).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
The main issue here was the use of v2_monitors.List(lbaas.network, v2_monitors.ListOpts{PoolID: poolID}). This is trying to filter healthmonitors by pool_id, but that is not supported by the API. It creates a call like /lbaas/healthmonitors?pool_id=abc123. The API server ignores the pool_id parameter and returns a list of all healthmonitors (which k8s then tries to delete).
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Do not report error when deleting an attached volume
Persistent volume controller should not send warning events to a PV and mark the PV as failed when the volume is still attached.
This happens when a user quickly deletes a pod and associated PVC - PV is slowly detaching, while the PVC is already deleted and the PV enters Failed phase.
`Deleter.Deleter` can now return `tryAgainError`, which is sent as INFO to the PV to let the user know we did not forget to delete the PV, however the PV stays in Released state. The controller tries again in the next sync (15 seconds by default).
Fixes#31511
Addresses #33215.
When vCenter returns error vm not found, this is now being translated to
the appropriate error 'cloudprovider.InstanceNotFound' which indicates
to Kubernetes node controller that the VM is in fact not found.
When we are mounting a lot of volumes, we frequently hit rate limits.
Reduce the frequency with which we poll the status; introduces a bit of
latency but probably matches common attach times pretty closely, and
avoids causing rate limit problems everywhere.
Also, we now poll for longer, as when we timeout, the volume is in an
indeterminate state: it may be about to complete. The volume controller
can tolerate a slow attach/detach, but it is harder to tolerate the
indeterminism.
Finally, we ignore a sequence of errors in DescribeVolumes (up to 5 in a
row currently). So we will eventually return an error, but a one
off-failure (e.g. due to rate limits) does not cause us to spuriously
fail.
Bump version of golang.org/x/oauth2
Vendor google.golang.org/cloud/
Vendor google.golang.org/api/
Vendor cloud.google.com/go/compute/
Replace google.golang.org/cloud with cloud.google.com/go/
Fixes#30069